Learn Before
Concept

General Equations of an N-Gram Model

The general equations of an n-gram model apply the Markov assumption to estimate word probabilities. The conditional probability for the next word is approximated by looking N1N-1 words into the past:

P(wnw1:n1)P(wnwnN+1:n1)P(w_n|w_{1:n-1}) \approx P(w_n|w_{n-N+1:n-1})

The probability of a complete word sequence is approximated as the product of these conditional probabilities:

P(w1:n)k=1nP(wkwkN+1:k1)P(w_{1:n}) \approx \prod^n_{k=1}P(w_k|w_{k-N+1:k-1})

0

1

Updated 2026-06-19

Tags

Deep Learning

Data Science

Machine Learning Yearning @ DeepLearning.AI

Dive into Deep Learning @ D2L

Machine Learning

Supervised Learning