Formula

Auto-regressive Decomposition of Conditional Log-Likelihood

The conditional log-likelihood, often used as an objective function in sequence modeling, is computed by decomposing the probability of the entire output sequence y\mathbf{y} into a product of conditional probabilities for each token. In log space, this product becomes a sum. Specifically, the log-probability of sequence y\mathbf{y} given input x\mathbf{x} is the sum of the log-probabilities of each token yiy_i, conditioned on the input x\mathbf{x} and all previously generated tokens y<i\mathbf{y}_{<i}. The formula, parameterized by θ\theta, is: logPrθ(yx)=i=1nlogPrθ(yix,y<i)\log \text{Pr}_{\theta}(\mathbf{y}|\mathbf{x}) = \sum_{i=1}^{n} \log \text{Pr}_{\theta}(y_i|\mathbf{x}, \mathbf{y}_{<i}) where nn is the length of the sequence y\mathbf{y}.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences