Learn Before
Concept
Zipf's Law for n-grams and Sparsity
The power law distribution described by Zipf's law applies not only to individual words (unigrams) but also to sequences of words, such as bigrams and trigrams, though typically with a smaller exponent . Because many -grams occur very rarely in a corpus, methods relying solely on counting statistics face a significant sparsity problem and tend to overestimate the frequency of the infrequent combinations. This high frequency of rare occurrences makes counting-based methods inadequate for language modeling and strongly motivates the transition to deep learning models.
0
1
Updated 2026-05-13
Tags
D2L
Dive into Deep Learning @ D2L