Learn Before
Model Initialization Strategy in RLHF
The first step in the RLHF training process is to initialize the necessary models from existing ones. Typically, the reward and value models are initialized using a pre-trained Large Language Model. In contrast, the reference and target (policy) models are initialized from a model that has already undergone instruction fine-tuning. After this initialization, the reference model's parameters are fixed and are not updated further during the subsequent training stages.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Related
Model Initialization Strategy in RLHF
A team is developing a large language model aligned with human preferences using a reinforcement learning approach. Arrange the following key phases of their training pipeline into the correct chronological order.
Diagnosing a Flawed Alignment Process
An AI development team is in the final stage of a three-part alignment process for their language model. They observe that the model's outputs are becoming increasingly nonsensical, even though the reward scores assigned during this final stage are consistently high. The team has already confirmed that the initial models were set up correctly and that the dataset of human preferences used in the second stage is high-quality. Based on this information, what is the most probable cause of the model's deteriorating performance?
Learn After
Data Collection for Reward Modeling in RLHF
A machine learning team is implementing a training process that uses human feedback to align a language model. They have access to two base models: a general-purpose pre-trained language model (Model A) and a version of that model that has been further fine-tuned on a set of instructions (Model B). For the first stage of their process, which of the following initialization plans is correct for the policy, reference, reward, and value models?
Rationale for Freezing the Reference Model in RLHF
Analyzing an RLHF Initialization Error