Learn Before
Huge Language Models
Stupid Backoff
Stupid backoff gives up the idea of trying to make the language model a true probability distribution, and also does not discount the higher-order probabilities. If a higher-order n-gram has a zero count, we simply backoff to a lower order n-gram, weighed by a fixed (context-independent) weight , for which a value of 0.4 is found to work well.
The stupid backoff applied to n-gram is given by: The backoff terminates in the unigram, which has probability
0
1
4 years ago
Tags
Data Science
Related
Stupid Backoff