Learn Before
Parameterization of the LLM Policy
In the context of reinforcement learning, the Large Language Model (LLM) acts as the policy. This policy is a function defined by a set of parameters, commonly denoted by θ. These parameters, which consist of the neural network's weights and biases, are adjusted during the training phase to optimize the model's behavior.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Parameterization of the LLM Policy
A language model is being trained to generate helpful and harmless responses using feedback from a separate quality-assessment model. Arrange the following events into the correct chronological sequence for a single iterative step of this training loop.
An AI team is fine-tuning a language model to write compelling short stories. The model generates a story one token at a time. However, they find the model's outputs are becoming repetitive and nonsensical. Their current process involves having a reward model evaluate the entire 500-token story only after it is fully completed, providing a single quality score at the very end. Which of the following best explains why this training setup is failing?
In the iterative process of refining a language model using feedback, different components of the model's operation correspond to formal concepts from learning theory. Match each formal concept to its specific implementation in this language model training scenario.
Learn After
An engineering team is refining a large language model to be more helpful and harmless. They use a training process where the model generates responses, receives a quality score for each response, and then updates its internal decision-making function, known as the 'policy'. What specific, adjustable components of the model are being directly modified during this policy update?
The Role of Parameters in an LLM Policy
Analyzing Behavioral Changes in a Trained LLM