Learn Before
A model is being trained for a text labeling task where the goal is to maximize the probability assigned to the correct label for each word. The training loss is calculated as the average of the negative logarithm of these probabilities. Consider the model's performance on one sentence, evaluated by two different sets of parameters (Model A and Model B). The table below shows the probability each model assigned to the correct label for each of the seven words in the sentence.
| Word | Model A Probability | Model B Probability |
|---|---|---|
| Word 1 | 0.9 | 0.8 |
| Word 2 | 0.8 | 0.6 |
| Word 3 | 0.7 | 0.6 |
| Word 4 | 0.9 | 0.8 |
| Word 5 | 0.9 | 0.8 |
| Word 6 | 0.1 | 0.7 |
| Word 7 | 0.9 | 0.8 |
Based on this data, which model would have a lower training loss for this specific sentence, and why?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Calculating Model Training Loss
A model is being trained for a text labeling task where the goal is to maximize the probability assigned to the correct label for each word. The training loss is calculated as the average of the negative logarithm of these probabilities. Consider the model's performance on one sentence, evaluated by two different sets of parameters (Model A and Model B). The table below shows the probability each model assigned to the correct label for each of the seven words in the sentence.
Word Model A Probability Model B Probability Word 1 0.9 0.8 Word 2 0.8 0.6 Word 3 0.7 0.6 Word 4 0.9 0.8 Word 5 0.9 0.8 Word 6 0.1 0.7 Word 7 0.9 0.8 Based on this data, which model would have a lower training loss for this specific sentence, and why?
Impact of Model Confidence on Training Loss