1Cademy - An engineering team is refining a large language model to be more helpful and harmless. They use a training process where the model generates responses, receives a quality score for each response, and then updates its internal decision-making function, known as the policy. What specific, adjustable components of the model are being directly modified during this policy update?

Learn Before

Parameterization of the LLM Policy

Multiple Choice

An engineering team is refining a large language model to be more helpful and harmless. They use a training process where the model generates responses, receives a quality score for each response, and then updates its internal decision-making function, known as the 'policy'. What specific, adjustable components of the model are being directly modified during this policy update?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related