1Cademy - Chain Rule of Probability for Word Sequences

Learn Before

N-gram Language Modeling

Concept

Chain Rule of Probability for Word Sequences

Represent a sequence of $n$ words as either $w_1, \dots, w_n$ or $w_{1:n}$ . The joint probability of observing this exact sequence is denoted as $P(w_1, \dots, w_n)$ or $P(w_{1:n})$ . By applying the chain rule of probability, this joint probability can be decomposed into a product of conditional probabilities: $P(w_{1:n}) = P(w_1)P(w_2|w_1)\dots P(w_n|w_{1:n-1}) = \prod_{k=1}^{n}P(w_k|w_{1:k-1})$ .