Comparing Model Predictions via Loss
A language model is tasked with predicting the next word in the sentence 'The cat sat on the ___'. The correct next word is 'mat'. Two different models, Model A and Model B, produce the following probability distributions for the next word:
- Model A: P('mat') = 0.8, P('rug') = 0.1, P('floor') = 0.05, P('chair') = 0.05
- Model B: P('mat') = 0.2, P('rug') = 0.3, P('floor') = 0.3, P('chair') = 0.2
Based on the principle of minimizing the difference between the predicted and the true probability distribution, which model is performing better on this specific example? Explain your reasoning.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to predict the next word in a sequence. For a given training example, the function is used to measure the difference between the model's predicted probability distribution over the vocabulary () and the true distribution (), where the true next word has a probability of 1. If the calculated value of is very high for this example, what does this most accurately indicate?
Evaluating Model Performance via Loss
Comparing Model Predictions via Loss