Learn Before
Analyzing Span Prediction Model Loss
A question-answering model is trained to identify answer spans in a text. Its training objective is to minimize a loss value calculated from two separate predictions for each token in the text: the probability of it being the start of the answer and the probability of it being the end. The total loss is the sum of the negative log-likelihoods from both prediction networks.
Consider two different training instances:
- Instance 1: The model assigns a very high probability to the correct start token and the correct end token.
- Instance 2: The model assigns a very low probability to the correct start token and the correct end token.
Which instance will result in a significantly higher loss value during training, and why is this the case?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Span Prediction Loss Formula
A question-answering model is being trained to identify a specific answer span within a passage. The model's training objective is to minimize a loss calculated from two separate predictions for each token: the probability of it being the start of the answer and the probability of it being the end. The total loss is calculated by summing the negative log-likelihoods from both prediction networks. In which of the following scenarios would the model incur the highest training loss for a single training example?
Analyzing Span Prediction Model Loss
Rationale for Combined Span Prediction Loss