Formula

Token-Level Conditional Log-Probability in Supervised Fine-Tuning

The conditional log-probability logPrθ(yx)\log \mathrm{Pr}_{\theta}(\mathbf{y}|\mathbf{x}) for an entire output sequence y\mathbf{y} given an input x\mathbf{x} is computed at the token level using the chain rule. For an output sequence of length nn, the objective sums the log-probabilities of each token yiy_i, conditioned on both the input x\mathbf{x} and all preceding tokens in the output sequence y<i\mathbf{y}_{<i}. This is expressed mathematically as: logPrθ(yx)=i=1nlogPrθ(yix,y<i)\log \mathrm{Pr}_{\theta}(\mathbf{y}|\mathbf{x}) = \sum_{i=1}^{n} \log \mathrm{Pr}_{\theta}(y_i|\mathbf{x},\mathbf{y}_{<i}). Minimizing this conditional log-probability is mathematically equivalent to minimizing the cross-entropy loss.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences