Formula

SFT Objective as Maximizing Joint Log-Probability of Concatenated Sequences

When Supervised Fine-Tuning (SFT) is framed as a standard language model training task, the objective is to find the parameters θ~\tilde{\theta} that maximize the sum of the log-probabilities of the concatenated input-output sequences across the entire dataset D\mathcal{D}. This is formally expressed as: θ~=argmaxθ(x,y)DlogPrθ(seqx,y)\tilde{\theta} = \arg\max_{\theta} \sum_{(\mathbf{x},\mathbf{y}) \in \mathcal{D}} \log \mathrm{Pr}_{\theta}(\mathrm{seq}_{\mathbf{x},\mathbf{y}}). By taking logPrθ(seqx,y)\log \mathrm{Pr}_{\theta}(\mathrm{seq}_{\mathbf{x},\mathbf{y}}) as the objective function, SFT can be implemented using standard LLMs, treating the combined input x\mathbf{x} and output y\mathbf{y} as a single sequence for the model to process.

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences