1Cademy - Span Prediction Loss Function

Learn Before

BERT-based Architecture for Span Prediction

Concept

Span Prediction Loss Function

The training objective for a span prediction model involves calculating a loss based on the outputs of its start-of-span and end-of-span prediction networks. The total loss is computed by summing the negative log-likelihoods from both networks across all tokens within the context passage.

Updated 2026-04-18

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Span Prediction Loss Formula
A question-answering model is being trained to identify a specific answer span within a passage. The model's training objective is to minimize a loss calculated from two separate predictions for each token: the probability of it being the start of the answer and the probability of it being the end. The total loss is calculated by summing the negative log-likelihoods from both prediction networks. In which of the following scenarios would the model incur the highest training loss for a single
Analyzing Span Prediction Model Loss
Rationale for Combined Span Prediction Loss

Learn Before

Related

Learn After