1Cademy - Schematic of Probability Calculation in Causal Language Modeling

Learn Before

Example

Schematic of Probability Calculation in Causal Language Modeling

This schematic illustrates the sequential probability calculation in Causal Language Modeling, a type of auto-regressive model. For a sequence $x_0, x_1, ..., x_4$ , the model predicts each token based on the embeddings of the tokens that came before it. The process begins by setting the probability of the first token, $Pr(x_0)$ , to 1. Each subsequent token's probability is then conditioned on the embeddings of all prior tokens, as shown in the diagram below. This unidirectional, step-by-step dependency is a core feature of causal language models.

Token:      x0        x1              x2                    x3                          x4
            ↓         ↓               ↓                     ↓                           ↓
Probability: Pr(x0)=1   Pr(x1|e0)       Pr(x2|e0, e1)         Pr(x3|e0, e1, e2)           Pr(x4|e0, e1, e2, e3)

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After