Short Answer

Impact of Model Confidence on Training Loss

A model is being trained to identify named entities in text. The training process uses a loss function calculated as the average negative logarithm of the probability assigned to the correct entity tag for each word. Consider two individual words from a training sentence. For Word A, the model assigns a probability of 0.99 to the correct tag. For Word B, the model assigns a probability of 0.01 to the correct tag. Analyze and compare the contribution of each of these words to the total training loss. Which word will have a significantly larger impact on the loss, and why is this behavior desirable for training the model?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science