Formula

Maximum Likelihood Estimation (MLE) as the Objective for Supervised Fine-Tuning

In Supervised Fine-Tuning (SFT), the training objective is to maximize the probability of the model generating a "gold-standard" or ground-truth output (yy) given a specific input (xx). This is achieved through Maximum Likelihood Estimation (MLE), where the model's parameters are adjusted to make its predicted token distributions align as closely as possible with the one-hot encoded distributions of the correct response. The formal objective is to maximize the conditional probability: max Pr(yx)max \ Pr(y|x).

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related