Learn Before
Defining the Ground Truth Distribution
A language model with a vocabulary of ['mat', 'hat', 'sat', 'cat', 'the'] is predicting the next word for the context 'The cat sat on the'. The correct next word is 'mat'. To calculate the error for this prediction, a loss function compares the model's predicted probability distribution to a 'gold' or ground truth distribution. Describe the 'gold' distribution for this specific case, representing it as a vector.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to predict the next word in a sequence. The training process aims to minimize a loss value, which measures the difference between the model's predicted probability distribution for the next word and the actual correct word. Consider two separate predictions for the next word after the phrase 'The sun is shining...':
- Prediction A: The model assigns a probability of 0.75 to the correct word, 'brightly'.
- Prediction B: The model assigns a probability of 0.15 to the correct word, 'brightly'.
Which of the following statements accurately analyzes the loss values for these two predictions?
Total Loss Calculation for a Token Sequence
Evaluating Model Prediction Quality
Defining the Ground Truth Distribution