Learn Before
Comparing Model Alignment Techniques
A machine learning team is deciding between two methods to align a language model with human preferences.
Method A involves using a reward model to score multiple generated outputs for a given prompt, selecting only the highest-scoring output, and then fine-tuning the language model on a large dataset of these 'best' prompt-output pairs.
Method B involves using the reward model's scores as a reward signal to directly update the language model's policy using a reinforcement learning algorithm.
Explain the primary trade-off the team is facing by describing the main advantage of Method A over Method B.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team aims to align a large language model with human preferences. Their methodology is as follows:
- For each input prompt, generate 16 different responses from the model.
- Use a pre-trained 'reward model' to assign a quality score to each of the 16 responses.
- Select only the single highest-scoring response for that prompt.
- Compile a new dataset consisting of thousands of these prompt-and-best-response pairs.
- Fine-tune the original language model on this new dataset using standard supervised learning methods.
Which statement most accurately evaluates this team's approach?
Choosing an Alignment Strategy for a Startup
Comparing Model Alignment Techniques