Learn Before
Ground-Truth Distribution as a One-Hot Representation
In language modeling, the ground-truth distribution at a given position, denoted as , is defined as the one-hot representation of the actual next token, . This one-hot vector acts as the exact target for the model's prediction at that step.
0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Pre-training Objective for Language Models
Example of a Token Sequence
Example of an Indexed Token Sequence
A language model is evaluated on a sequence of four tokens,
(x_0, x_1, x_2, x_3). The model's performance is measured by calculating a loss value at each step of the sequence generation. The individual losses are as follows: the loss for predicting tokenx_1is 1.2, the loss for predictingx_2is 0.5, and the loss for predictingx_3is 2.3. Based on this information, what is the total loss for the entire token sequence?Comparative Model Performance Analysis
A language model's performance is being evaluated on the token sequence
('The', 'cat', 'sat', 'on'). The total loss for this sequence is calculated by summing the individual losses from each predictive step. Which of the following sets of predictions contributes to this total loss calculation?Ground-Truth Distribution as a One-Hot Representation
Learn After
A language model is being trained on a text corpus where it learns to predict the next word in a sequence. The model's entire vocabulary is ordered as follows:
['a', 'bright', 'day', 'is', 'shining']. If the model is given the input context 'a bright' and the actual next word in the training data is 'day', which vector correctly represents the ground-truth target for this specific training step?In the context of training a language model, representing the ground-truth distribution as a one-hot vector implies that the training process considers all incorrect tokens to be equally wrong, regardless of their semantic similarity to the correct token.
Explaining the Ground-Truth Vector