Learn Before
Concept
Markov Assumption in N-Gram Models
Calculating the exact probability of a word sequence using the chain rule requires conditioning on the entire history of preceding words, which leads to severe data sparsity and computational issues. The Markov assumption simplifies this by assuming that the probability of a word depends only on a fixed window of preceding words. For an -gram model, the history is limited to the previous words, approximating .
0
1
Updated 2026-06-14
Tags
Data Science