1Cademy - Chain Rule of Probability for Auto-regressive Language Models

Learn Before

Auto-Regressive (AR) Models

Formula

Chain Rule of Probability for Auto-regressive Language Models

Auto-regressive language models calculate the probability of a text sequence, $\mathbf{x}$ , by decomposing it into a product of conditional probabilities using the chain rule. The probability of each token $x_i$ is conditioned on all preceding tokens in the sequence. The general formula for a sequence $\mathbf{x} = (x_0, ..., x_{m-1})$ is: $\text{Pr}(\mathbf{x}) = \prod_{i=0}^{m-1} \text{Pr}(x_i | x_0, ..., x_{i-1})$ For example, for a sequence of five tokens, this expands to: $\text{Pr}(x) = \text{Pr}(x_0) \cdot \text{Pr}(x_1|x_0) \cdot \text{Pr}(x_2|x_0, x_1) \cdot \text{Pr}(x_3|x_0, x_1, x_2) \cdot \text{Pr}(x_4|x_0, x_1, x_2, x_3)$

0

1

Updated 2025-10-08

Contributors are:

Who are from:

References

Learn Before

Related

Learn After