1Cademy - A model is being trained for a text labeling task where the goal is to maximize the probability assigned to the correct label for each word. The training loss is calculated as the average of the negative logarithm of these probabilities. Consider the models performance on one sentence, evaluated by two different sets of parameters (Model A and Model B). The table below shows the probability each model assigned to the *correct* label for each of the seven words in the sentence. | Word | Model A Probability | Model B Probability | |---------|---------------------|---------------------| | Word 1 | 0.9 | 0.8 | | Word 2 | 0.8 | 0.6 | | Word 3 | 0.7 | 0.6 | | Word 4 | 0.9 | 0.8 | | Word 5 | 0.9 | 0.8 | | Word 6 | 0.1 | 0.7 | | Word 7 | 0.9 | 0.8 | Based on this data, which model would have a lower training loss for this specific sentence, and why?

Learn Before

Negative Log-Likelihood Loss for NER

Multiple Choice

A model is being trained for a text labeling task where the goal is to maximize the probability assigned to the correct label for each word. The training loss is calculated as the average of the negative logarithm of these probabilities. Consider the model's performance on one sentence, evaluated by two different sets of parameters (Model A and Model B). The table below shows the probability each model assigned to the correct label for each of the seven words in the sentence.

Word	Model A Probability	Model B Probability
Word 1	0.9	0.8
Word 2	0.8	0.6
Word 3	0.7	0.6
Word 4	0.9	0.8
Word 5	0.9	0.8
Word 6	0.1	0.7
Word 7	0.9	0.8

Based on this data, which model would have a lower training loss for this specific sentence, and why?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related