Learn Before
Concept

One-Hot Encoding for Language Model Tokens

In language modeling, representing a token by its scalar index is ineffective because numerical proximity does not equate to semantic similarity (for instance, the 45th and 46th words are not necessarily related in meaning). Instead, each token is represented using a one-hot encoding: a vector with a length equal to the vocabulary size, denoted as NN. In this vector, the entry corresponding to the token's specific index is set to 11, while all other entries are set to 00. For example, with a vocabulary of five elements, the index 22 would be represented as the one-hot vector [0,0,1,0,0][0, 0, 1, 0, 0].

0

1

Updated 2026-05-16

Tags

D2L

Dive into Deep Learning @ D2L