Explaining the Ground-Truth Vector
In the training process for a language model, the ground-truth distribution for the correct next token is represented as a vector containing a single '1' and zeros for all other entries. In your own words, explain what the position of the '1' signifies and what this vector structure implies about the training target.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained on a text corpus where it learns to predict the next word in a sequence. The model's entire vocabulary is ordered as follows:
['a', 'bright', 'day', 'is', 'shining']. If the model is given the input context 'a bright' and the actual next word in the training data is 'day', which vector correctly represents the ground-truth target for this specific training step?In the context of training a language model, representing the ground-truth distribution as a one-hot vector implies that the training process considers all incorrect tokens to be equally wrong, regardless of their semantic similarity to the correct token.
Explaining the Ground-Truth Vector