Learn Before
Concept
Building a Vocabulary
A fundamental preprocessing step in natural language processing involves constructing a vocabulary from a given text corpus. This procedure identifies the unique tokens present in the dataset and establishes a mapping for them. To limit the vocabulary size and manage data sparsity, words that occur less frequently than a specified minimum threshold are excluded from the primary vocabulary and are instead mapped to a common placeholder.
0
1
Updated 2026-05-25
Tags
D2L
Dive into Deep Learning @ D2L