Learn Before
Span Prediction Loss Function
The training objective for a span prediction model involves calculating a loss based on the outputs of its start-of-span and end-of-span prediction networks. The total loss is computed by summing the negative log-likelihoods from both networks across all tokens within the context passage.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.1 Pre-training - Foundations of Large Language Models
Related
Span Prediction Loss Function
Inference for Span Prediction
Illustration of BERT-based Architecture for Span Prediction
Input Sequence Formatting for Span Prediction
Applying Prediction Networks to Context Token Outputs
An engineer is designing a model to extract answers from a paragraph. The model must identify a continuous segment of text (a 'span') that answers a given question. The model's base component processes the input and produces a contextualized vector representation for each token in the paragraph. Considering the task is to identify the start and end points of the answer, which of the following architectural designs for the final prediction layer is most appropriate?
Debugging a Question-Answering Model Architecture
Comparing Model Architectures for Text Extraction Tasks
Learn After
Span Prediction Loss Formula
A question-answering model is being trained to identify a specific answer span within a passage. The model's training objective is to minimize a loss calculated from two separate predictions for each token: the probability of it being the start of the answer and the probability of it being the end. The total loss is calculated by summing the negative log-likelihoods from both prediction networks. In which of the following scenarios would the model incur the highest training loss for a single training example?
Analyzing Span Prediction Model Loss
Rationale for Combined Span Prediction Loss