Learn Before
Left-to-Right Factorization in Sequence Models
While the joint probability of a sequence can be mathematically factorized in reverse (right-to-left) or any random order, left-to-right (in-order) factorization is generally preferred for language modeling. First, it aligns with the natural human intuition of anticipating upcoming words while reading. Second, factorizing in order allows the same language model to easily assign probabilities to arbitrarily long sequences by continually multiplying the current probability by the conditional probability of the next token: . Third, for causally structured data where future events cannot influence the past, predicting forward () is usually an easier predictive modeling problem than predicting backward ().
0
1
Tags
D2L
Dive into Deep Learning @ D2L