Learn Before
Multiple Choice

An engineering team is refining a large language model to be more helpful and harmless. They use a training process where the model generates responses, receives a quality score for each response, and then updates its internal decision-making function, known as the 'policy'. What specific, adjustable components of the model are being directly modified during this policy update?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science