Learn Before
Evaluating Model Prediction Quality
A language model is being trained to complete the sentence 'The sky is...'. The training data indicates the correct next word is 'blue'. The model's performance is measured by a value that quantifies the difference between its predicted probabilities and the correct outcome. A lower value indicates a better prediction.
Two different versions of the model produce the following probability distributions for the next word:
- Model Alpha: Assigns a probability of 0.8 to 'blue', 0.1 to 'green', and 0.1 to 'cloudy'.
- Model Beta: Assigns a probability of 0.3 to 'blue', 0.5 to 'cloudy', and 0.2 to 'green'.
Which model, Alpha or Beta, would be assigned a lower performance penalty (i.e., a lower loss value) for this specific prediction? Justify your reasoning.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to predict the next word in a sequence. The training process aims to minimize a loss value, which measures the difference between the model's predicted probability distribution for the next word and the actual correct word. Consider two separate predictions for the next word after the phrase 'The sun is shining...':
- Prediction A: The model assigns a probability of 0.75 to the correct word, 'brightly'.
- Prediction B: The model assigns a probability of 0.15 to the correct word, 'brightly'.
Which of the following statements accurately analyzes the loss values for these two predictions?
Total Loss Calculation for a Token Sequence
Evaluating Model Prediction Quality
Defining the Ground Truth Distribution