1Cademy - Causal Language Modeling

Learn Before

Standard Auto-Regressive Probability Factorization using Embeddings

Concept

Causal Language Modeling

Causal language modeling, also known as standard language modeling, is an auto-regressive pre-training approach where tokens are sequentially predicted following their natural, fixed order in the text (typically left-to-right). For instance, a sequence of ${}5$ tokens $x_0 x_1 x_2 x_3 x_4$ is generated in the order $x_0 \to x_1 \to x_2 \to x_3 \to x_4$ . The overall sequence probability $\Pr(\mathbf{x})$ is the product of individual token probabilities conditioned on preceding tokens: $\Pr(x_0) \cdot \Pr(x_1|x_0) \cdot \Pr(x_2|x_0,x_1) \cdot \Pr(x_3|x_0,x_1,x_2) \cdot \Pr(x_4|x_0,x_1,x_2,x_3)$ . By substituting $\mathbf{e}_i$ as the embedding for token $x_i$ (a combination of its token and positional embeddings), the generation process is modeled as: $\Pr(x_0) \cdot \Pr(x_1|\mathbf{e}_0) \cdot \Pr(x_2|\mathbf{e}_0,\mathbf{e}_1) \cdot \Pr(x_3|\mathbf{e}_0,\mathbf{e}_1, \mathbf{e}_2) \cdot \Pr(x_4|\mathbf{e}_0, \mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3)$ . This demonstrates that each prediction depends solely on past context.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

References

Learn Before

Related

Learn After