Learn Before
Choosing a Baseline for Preference Alignment
An AI research team is setting up a training process to align a large language model with human preferences. They are using an objective function that penalizes the model for deviating too far from a fixed, stable baseline policy. Two options are proposed for this baseline:
- The original, pre-trained base model, before any instruction tuning.
- A version of the model that has already undergone supervised fine-tuning (SFT) on high-quality conversational data.
Evaluate the potential consequences of choosing each option as the baseline. Which option is generally preferred, and why does this choice contribute to better training stability and final model quality?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI development team is refining a pre-trained language model using a dataset of human preferences, where each example consists of a prompt, a preferred response, and a rejected response. As training progresses, they notice that while the model is learning to generate responses that align with the preferences, its general language quality is deteriorating; it produces more repetitive and nonsensical text. What is the most probable cause of this issue related to the optimization objective's design?
Choosing a Baseline for Preference Alignment
Selecting a Baseline for Policy Optimization
Conceptual Objective Function Assumed in DPO