Learn Before
Formula

Derivation of Most Probable HMM Tag Sequence

By Bayes' rule, the most probable tag sequence t^1:n\hat t_{1:n} for an observation sequence of words w1...wnw_1...w_n is:

t^1:n=arg maxt1,...,tnP(t1...tnw1...wn)\hat t_{1:n} = \argmax_{t_1, ..., t_n} P(t_1...t_n|w_1...w_n) =arg maxt1,...,tnP(w1...wnt1...tn)P(t1...tn)P(w1...wn)= \argmax_{t_1, ..., t_n} \frac{P(w_1...w_n|t_1...t_n)P(t_1...t_n)}{P(w_1...w_n)} = argmax_{t_1, ..., t_n} P(w_1...w_n|t_1...t_n)P(t_1...t_n)

HMM taggers make two simplifying assumptions:

  1. The probability of a word appearing depends only on its own tag and is independent of neighboring words and tags: P(w1...wnt1...tn)i=1nP(witi)P(w_1...w_n|t_1...t_n) \approx \prod_{i=1}^n P(w_i|t_i)
  2. The probability of a tag is dependent only on the previous tag, rather than the entire tag sequence: P(t1...tn)i=1nP(titi1)P(t_1...t_n) \approx \prod_{i=1}^n P(t_i|t_{i-1})

Hence, it follows that: t^1:narg maxt1,...,tni=1nP(witi)P(titi1)\hat t_{1:n} \approx \argmax_{t_1, ..., t_n} \prod_{i=1}^n P(w_i|t_i)P(t_i|t_{i-1})

The two parts of this equation correspond to the emission probability and the transition probability, respectively.

0

1

Updated 2026-06-14

Tags

Data Science