Formula

Mathematical Formulation of the Supervised Fine-Tuning Objective

In supervised fine-tuning (SFT), the goal is to adjust pre-trained model parameters to maximize the conditional probability of the target output sequence y\mathbf{y} given the input sequence x\mathbf{x}. Given pre-trained parameters θ^\hat{\theta} and a dataset D\mathcal{D} of input-output pairs, the objective is to find the optimized parameters θ~\tilde{\theta} by maximizing the sum of conditional log-probabilities: θ~=argmaxθ^+(x,y)DlogPrθ^+(yx)\tilde{\theta} = \arg\max_{\hat{\theta}^+} \sum_{(\mathbf{x},\mathbf{y}) \in \mathcal{D}} \log \mathrm{Pr}_{\hat{\theta}^+}(\mathbf{y}|\mathbf{x}). This formulation highlights that optimization starts from the pre-trained weights rather than from random initialization.

Image 0

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related