Concept

Vocabulary Size in Transformers

In Transformer models, the vocabulary size, denoted as V|V|, specifies the number of distinct tokens the model can recognize. Each input token corresponds to a specific entry in this vocabulary VV. Choosing the size of this vocabulary involves a clear trade-off: a larger vocabulary allows the model to cover more surface form variations of words, but it simultaneously increases the overall storage requirements and parameter count of the model.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related