Learn Before
Multiple Choice
A development team aims to align a large language model with human preferences. Their methodology is as follows:
- For each input prompt, generate 16 different responses from the model.
- Use a pre-trained 'reward model' to assign a quality score to each of the 16 responses.
- Select only the single highest-scoring response for that prompt.
- Compile a new dataset consisting of thousands of these prompt-and-best-response pairs.
- Fine-tune the original language model on this new dataset using standard supervised learning methods.
Which statement most accurately evaluates this team's approach?
0
1
Updated 2025-09-28
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team aims to align a large language model with human preferences. Their methodology is as follows:
- For each input prompt, generate 16 different responses from the model.
- Use a pre-trained 'reward model' to assign a quality score to each of the 16 responses.
- Select only the single highest-scoring response for that prompt.
- Compile a new dataset consisting of thousands of these prompt-and-best-response pairs.
- Fine-tune the original language model on this new dataset using standard supervised learning methods.
Which statement most accurately evaluates this team's approach?
Choosing an Alignment Strategy for a Startup
Comparing Model Alignment Techniques