Learn Before
Essay

Choosing a Baseline for Preference Alignment

An AI research team is setting up a training process to align a large language model with human preferences. They are using an objective function that penalizes the model for deviating too far from a fixed, stable baseline policy. Two options are proposed for this baseline:

  1. The original, pre-trained base model, before any instruction tuning.
  2. A version of the model that has already undergone supervised fine-tuning (SFT) on high-quality conversational data.

Evaluate the potential consequences of choosing each option as the baseline. Which option is generally preferred, and why does this choice contribute to better training stability and final model quality?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science