Formula

Derivation of Sequence Log-Probability via Chain Rule

The log-probability of a sequence x=(x0,,xm)\mathbf{x} = (x_0, \dots, x_m) is derived by applying the logarithm to the product form of the chain rule of probability. This key step transforms the product of conditional probabilities into a more computationally stable sum. The derivation proceeds as follows:

logPr(x)=logPr(x0xm)\log \text{Pr}(\mathbf{x}) = \log \text{Pr}(x_0 \dots x_m)

=log[Pr(x0)Pr(x1x0)Pr(xmx0xm1)]= \log [\text{Pr}(x_0) \text{Pr}(x_1|x_0) \cdots \text{Pr}(x_m|x_0 \dots x_{m-1})]

=logPr(x0)+j=1mlogPr(xjx<j)= \log \text{Pr}(x_0) + \sum_{j=1}^{m} \log \text{Pr}(x_j|\mathbf{x}_{<j})

This decomposition is a foundational step for formulating the log-likelihood objective in language models.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences