Learn Before
Concept
One-Hot Encoding for Language Model Tokens
In language modeling, representing a token by its scalar index is ineffective because numerical proximity does not equate to semantic similarity (for instance, the 45th and 46th words are not necessarily related in meaning). Instead, each token is represented using a one-hot encoding: a vector with a length equal to the vocabulary size, denoted as . In this vector, the entry corresponding to the token's specific index is set to , while all other entries are set to . For example, with a vocabulary of five elements, the index would be represented as the one-hot vector .
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L