Learn Before
  • Huge Language Models

Stupid Backoff

Stupid backoff gives up the idea of trying to make the language model a true probability distribution, and also does not discount the higher-order probabilities. If a higher-order n-gram has a zero count, we simply backoff to a lower order n-gram, weighed by a fixed (context-independent) weight λ\lambda, for which a value of 0.4 is found to work well.

The stupid backoff applied to n-gram is given by: S(wiwik+1i1)={count(wik+1i)count(wik+1i1) if count(wik+1i)>0λS(wiwik+2i1) otherwiseS(w_i | w^{i-1}_{i-k+1}) = \begin{cases} \frac{\text{count}(w^i_{i-k+1})}{\text{count}(w^{i-1}_{i-k+1})} &\text{ if count}(w^i_{i-k+1}) > 0 \\ \lambda S(w_i|w^{i-1}_{i-k+2}) &\text{ otherwise} \end{cases} The backoff terminates in the unigram, which has probability S(w)=count(w)NS(w) = \frac{\text{count}(w)}{N}

0

1

4 years ago

Tags

Data Science

Related
  • Stupid Backoff