Learn Before
Analysis of Span Prediction Loss
A language model is trained on a question-answering task where it must identify the start and end tokens of an answer span. For a specific training example, the correct start and end tokens are both at position 5.
- Model A predicts the probability of the start token being at position 5 is 0.8, and the end token at position 5 is 0.7.
- Model B predicts the probability of the start token being at position 5 is 0.5, and the end token at position 5 is 0.6.
Analyze which model will have a lower loss value for this specific example and explain your reasoning based on the components of the loss calculation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is processing a single training example for a question-answering task. The correct answer span begins at token 25 and ends at token 28. The model predicts the probability of token 25 being the start as 0.6, and the probability of token 28 being the end as 0.7. Using the standard loss calculation for this task, which sums the negative log-likelihoods of the correct start and end positions (
Loss = - (log p_start + log p_end)), what is the loss value for this example? (Use the natural logarithm, ln, and round to three decimal places).Comparing Model Performance via Loss Calculation
Analysis of Span Prediction Loss