Short Answer

Analysis of Span Prediction Loss

A language model is trained on a question-answering task where it must identify the start and end tokens of an answer span. For a specific training example, the correct start and end tokens are both at position 5.

  • Model A predicts the probability of the start token being at position 5 is 0.8, and the end token at position 5 is 0.7.
  • Model B predicts the probability of the start token being at position 5 is 0.5, and the end token at position 5 is 0.6.

Analyze which model will have a lower loss value for this specific example and explain your reasoning based on the components of the loss calculation.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science