Learn Before
Concept
Shared Vocabulary for Input and Output in Language Models
In language modeling, the model's input tokens and output predictions are drawn from the same vocabulary. As a consequence, both the input representation and the output layer share the same dimensionality, which equals the vocabulary size. This architectural property distinguishes language models from many other sequence-to-sequence tasks where the source and target vocabularies may differ.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L