1Cademy - Vocabulary Size in Transformers

Learn Before

BERT (Bidirectional Encoder Representations from Transformers)
Considerations in BERT Model Development
Key Hyperparameters of a Transformer Encoder

Concept

Vocabulary Size in Transformers

In Transformer models, the vocabulary size, denoted as $|V|$ , specifies the number of distinct tokens the model can recognize. Each input token corresponds to a specific entry in this vocabulary $V$ . Choosing the size of this vocabulary involves a clear trade-off: a larger vocabulary allows the model to cover more surface form variations of words, but it simultaneously increases the overall storage requirements and parameter count of the model.