By Bayes' rule, we have that
t^1:n=argmaxt1,...,tnP(t1...tn∣w1...wn)=
argmaxt1,...,tnP(w1...wn)P(w1...wn∣t1...tn)P(t1...tn)=
argmaxt1,...,tnP(w1...wn∣t1...tn)P(t1...tn)
HMM taggers make two simplifying assumptions:
- The probability of a word appearing depends only on its own tag and is independent of neighbouring words and tags:
P(w1...wn∣t1...tn)≈∏i=1nP(wi∣ti)
- The probability of a tag is dependent only on the previous tag, rather than the entire tag sequence:
P(t1...tn)≈∏i=1nP(ti∣ti−1)
Hence it follows that
t^1:n≈argmaxt1,...,tn∏i=1nP(wi∣ti)P(ti∣ti−1)
the two parts of which corresponds to the emission probability and the transition probability respectively.