Concept

Markov Assumption in N-Gram Models

Calculating the exact probability of a word sequence using the chain rule requires conditioning on the entire history of preceding words, which leads to severe data sparsity and computational issues. The Markov assumption simplifies this by assuming that the probability of a word depends only on a fixed window of preceding words. For an nn-gram model, the history is limited to the previous n1n-1 words, approximating P(wkw1:k1)P(wkwkn+1:k1)P(w_k|w_{1:k-1}) \approx P(w_k|w_{k-n+1:k-1}).

0

1

Updated 2026-06-14

Contributors are:

Who are from:

Tags

Data Science

Related