Concept

SFT as Language Model Training on Concatenated Sequences

Supervised Fine-Tuning (SFT) can be framed as a standard language model training process. This is accomplished by defining the objective function as the joint log-probability of the concatenated input x and output y sequences, log Prθ(seq_x,y). In this setup, the model learns to predict the entire sequence, but the training loss is computed exclusively from the output y portion, which aligns the training with the goal of maximizing the conditional probability Pr(y|x).

Image 0

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related