1Cademy - Impact of Model Confidence on Training Loss

Learn Before

Negative Log-Likelihood Loss for NER

Short Answer

Impact of Model Confidence on Training Loss

A model is being trained to identify named entities in text. The training process uses a loss function calculated as the average negative logarithm of the probability assigned to the correct entity tag for each word. Consider two individual words from a training sentence. For Word A, the model assigns a probability of 0.99 to the correct tag. For Word B, the model assigns a probability of 0.01 to the correct tag. Analyze and compare the contribution of each of these words to the total training loss. Which word will have a significantly larger impact on the loss, and why is this behavior desirable for training the model?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Word	Model A Probability	Model B Probability
Word 1	0.9	0.8
Word 2	0.8	0.6
Word 3	0.7	0.6
Word 4	0.9	0.8
Word 5	0.9	0.8
Word 6	0.1	0.7
Word 7	0.9	0.8

Learn Before

Related