1Cademy - Base Case for Sequence Probability

Learn Before

Chain Rule for Sequence Probability

Formula

Base Case for Sequence Probability

In the chain rule for sequence probability, the base case is the first token, $x_0$ . Since there are no preceding tokens, its probability is its marginal probability, $\text{Pr}(x_0)$ . In many language models, this initial token is a deterministic start-of-sequence symbol, meaning its probability is fixed at 1, i.e., $\text{Pr}(x_0) = 1$ . This assumption simplifies the joint probability calculation for the rest of the sequence. Specifically, the probability of the sequence following the initial token, $\text{Pr}(x_1, ..., x_m|x_0)$ , is unaffected when multiplied by $\text{Pr}(x_0)$ , as shown by the equation: $\text{Pr}(x_0) \text{Pr}(x_1, ..., x_m|x_0) = \text{Pr}(x_1, ..., x_m|x_0)$ This effectively means the calculation can start from the second token, conditioned on the first.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

References

Learn Before

Related

Learn After