Learn Before
Concept

Chain Rule of Probability for Word Sequences

Represent a sequence of nn words as either w1,,wnw_1, \dots, w_n or w1:nw_{1:n}. The joint probability of observing this exact sequence is denoted as P(w1,,wn)P(w_1, \dots, w_n) or P(w1:n)P(w_{1:n}). By applying the chain rule of probability, this joint probability can be decomposed into a product of conditional probabilities: P(w1:n)=P(w1)P(w2w1)P(wnw1:n1)=k=1nP(wkw1:k1)P(w_{1:n}) = P(w_1)P(w_2|w_1)\dots P(w_n|w_{1:n-1}) = \prod_{k=1}^{n}P(w_k|w_{1:k-1}).

0

1

Updated 2026-06-14

Tags

Data Science

Learn After