Learn Before
Concept

Zipf's Law for n-grams and Sparsity

The power law distribution described by Zipf's law applies not only to individual words (unigrams) but also to sequences of words, such as bigrams and trigrams, though typically with a smaller exponent α\alpha. Because many nn-grams occur very rarely in a corpus, methods relying solely on counting statistics face a significant sparsity problem and tend to overestimate the frequency of the infrequent combinations. This high frequency of rare occurrences makes counting-based methods inadequate for language modeling and strongly motivates the transition to deep learning models.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L