Learn Before
Case Study

Analyzing Span Prediction Model Loss

A question-answering model is trained to identify answer spans in a text. Its training objective is to minimize a loss value calculated from two separate predictions for each token in the text: the probability of it being the start of the answer and the probability of it being the end. The total loss is the sum of the negative log-likelihoods from both prediction networks.

Consider two different training instances:

  • Instance 1: The model assigns a very high probability to the correct start token and the correct end token.
  • Instance 2: The model assigns a very low probability to the correct start token and the correct end token.

Which instance will result in a significantly higher loss value during training, and why is this the case?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science