1Cademy - Stupid Backoff

Learn Before

Huge Language Models

Stupid Backoff

Stupid backoff gives up the idea of trying to make the language model a true probability distribution, and also does not discount the higher-order probabilities. If a higher-order n-gram has a zero count, we simply backoff to a lower order n-gram, weighed by a fixed (context-independent) weight $\lambda$ , for which a value of 0.4 is found to work well.

The stupid backoff applied to n-gram is given by: $S(w_i | w^{i-1}_{i-k+1}) = \begin{cases} \frac{\text{count}(w^i_{i-k+1})}{\text{count}(w^{i-1}_{i-k+1})} &\text{ if count}(w^i_{i-k+1}) > 0 \\ \lambda S(w_i|w^{i-1}_{i-k+2}) &\text{ otherwise} \end{cases}$ The backoff terminates in the unigram, which has probability $S(w) = \frac{\text{count}(w)}{N}$

0

1

4 years ago

Contributors are:

Xinrong Yao

🏆 10

Who are from:

University of Michigan - Ann Arbor

🏆 10

References

Speech and Language Processing (3rd ed. draft)

Learn Before

Related