Short Answer

Comparing Model Alignment Techniques

A machine learning team is deciding between two methods to align a language model with human preferences.

Method A involves using a reward model to score multiple generated outputs for a given prompt, selecting only the highest-scoring output, and then fine-tuning the language model on a large dataset of these 'best' prompt-output pairs.

Method B involves using the reward model's scores as a reward signal to directly update the language model's policy using a reinforcement learning algorithm.

Explain the primary trade-off the team is facing by describing the main advantage of Method A over Method B.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science