1Cademy - Left-to-Right Factorization in Sequence Models

Learn Before

Language Model

Concept

Left-to-Right Factorization in Sequence Models

While the joint probability of a sequence $P(x_1, \ldots, x_T)$ can be mathematically factorized in reverse (right-to-left) or any random order, left-to-right (in-order) factorization is generally preferred for language modeling. First, it aligns with the natural human intuition of anticipating upcoming words while reading. Second, factorizing in order allows the same language model to easily assign probabilities to arbitrarily long sequences by continually multiplying the current probability by the conditional probability of the next token: $P(x_{t+1}, \ldots, x_1) = P(x_{t}, \ldots, x_1) \cdot P(x_{t+1} \mid x_{t}, \ldots, x_1)$ . Third, for causally structured data where future events cannot influence the past, predicting forward ( $P(x_{t+1} \mid x_t)$ ) is usually an easier predictive modeling problem than predicting backward ( $P(x_t \mid x_{t+1})$ ).

0

1

Updated 2026-05-13

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related