Learn Before
An engineering team is refining a large language model to be more helpful and harmless. They use a training process where the model generates responses, receives a quality score for each response, and then updates its internal decision-making function, known as the 'policy'. What specific, adjustable components of the model are being directly modified during this policy update?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineering team is refining a large language model to be more helpful and harmless. They use a training process where the model generates responses, receives a quality score for each response, and then updates its internal decision-making function, known as the 'policy'. What specific, adjustable components of the model are being directly modified during this policy update?
The Role of Parameters in an LLM Policy
Analyzing Behavioral Changes in a Trained LLM