1Cademy - Derivation of $\hat t

Learn Before

HMM Tagging as Decoding

Concept

Derivation of $\hat t_{1:n}$

By Bayes' rule, we have that $\hat t_{1:n} = \argmax_{t_1, ..., t_n} P(t_1...t_n|w_1...w_n) =$ $\argmax_{t_1, ..., t_n} \frac{P(w_1...w_n|t_1...t_n)P(t_1...t_n)}{P(w_1...w_n)} =$ $\argmax_{t_1, ..., t_n} P(w_1...w_n|t_1...t_n)P(t_1...t_n)$

HMM taggers make two simplifying assumptions:

The probability of a word appearing depends only on its own tag and is independent of neighbouring words and tags: $P(w_1...w_n|t_1...t_n) \approx \prod_{i=1}^n P(w_i|t_i)$
The probability of a tag is dependent only on the previous tag, rather than the entire tag sequence: $P(t_1...t_n) \approx \prod_{i=1}^n P(t_i|t_{i-1})$

Hence it follows that $\hat t_{1:n} \approx \argmax_{t_1, ..., t_n} \prod_{i=1}^n P(w_i|t_i)P(t_i|t_{i-1})$ the two parts of which corresponds to the emission probability and the transition probability respectively.

0

1

Updated 2021-11-07

Contributors are:

Xinrong Yao

🏆 1

Who are from:

University of Michigan - Ann Arbor

🏆 1

References

Speech and Language Processing (3rd ed. draft)

Learn Before

Related