1Cademy - Derivation of Sequence Log-Probability via Chain Rule

How it works Courses Research Communities Benefits About Us

Learn Before

Chain Rule for Sequence Probability

Formula

Derivation of Sequence Log-Probability via Chain Rule

The log-probability of a sequence $\mathbf{x} = (x_0, \dots, x_m)$ is derived by applying the logarithm to the product form of the chain rule of probability. This key step transforms the product of conditional probabilities into a more computationally stable sum. The derivation proceeds as follows:

$\log \text{Pr}(\mathbf{x}) = \log \text{Pr}(x_0 \dots x_m)$

$= \log [\text{Pr}(x_0) \text{Pr}(x_1|x_0) \cdots \text{Pr}(x_m|x_0 \dots x_{m-1})]$

$= \log \text{Pr}(x_0) + \sum_{j=1}^{m} \log \text{Pr}(x_j|\mathbf{x}_{<j})$

This decomposition is a foundational step for formulating the log-likelihood objective in language models.

0

1

Updated 2026-05-03

Contributors are:

Gemini AI

Who are from:

Google

References

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related

Learn After