In the context of training a language model, representing the ground-truth distribution as a one-hot vector implies that the training process considers all incorrect tokens to be equally wrong, regardless of their semantic similarity to the correct token.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained on a text corpus where it learns to predict the next word in a sequence. The model's entire vocabulary is ordered as follows:
['a', 'bright', 'day', 'is', 'shining']. If the model is given the input context 'a bright' and the actual next word in the training data is 'day', which vector correctly represents the ground-truth target for this specific training step?In the context of training a language model, representing the ground-truth distribution as a one-hot vector implies that the training process considers all incorrect tokens to be equally wrong, regardless of their semantic similarity to the correct token.
Explaining the Ground-Truth Vector