Learn Before
Concept

Derivation of t^1:n\hat t_{1:n}

By Bayes' rule, we have that t^1:n=arg maxt1,...,tnP(t1...tnw1...wn)=\hat t_{1:n} = \argmax_{t_1, ..., t_n} P(t_1...t_n|w_1...w_n) = arg maxt1,...,tnP(w1...wnt1...tn)P(t1...tn)P(w1...wn)=\argmax_{t_1, ..., t_n} \frac{P(w_1...w_n|t_1...t_n)P(t_1...t_n)}{P(w_1...w_n)} = arg maxt1,...,tnP(w1...wnt1...tn)P(t1...tn)\argmax_{t_1, ..., t_n} P(w_1...w_n|t_1...t_n)P(t_1...t_n)

HMM taggers make two simplifying assumptions:

  • The probability of a word appearing depends only on its own tag and is independent of neighbouring words and tags: P(w1...wnt1...tn)i=1nP(witi)P(w_1...w_n|t_1...t_n) \approx \prod_{i=1}^n P(w_i|t_i)
  • The probability of a tag is dependent only on the previous tag, rather than the entire tag sequence: P(t1...tn)i=1nP(titi1)P(t_1...t_n) \approx \prod_{i=1}^n P(t_i|t_{i-1})

Hence it follows that t^1:narg maxt1,...,tni=1nP(witi)P(titi1)\hat t_{1:n} \approx \argmax_{t_1, ..., t_n} \prod_{i=1}^n P(w_i|t_i)P(t_i|t_{i-1}) the two parts of which corresponds to the emission probability and the transition probability respectively.

0

1

Updated 2021-11-07

Tags

Data Science