1Cademy - Debugging a Question-Answering Model Architecture

Learn Before

BERT-based Architecture for Span Prediction

Case Study

Debugging a Question-Answering Model Architecture

A team is building a model to extract answers from documents. They adapt an existing architecture that was successful for identifying named entities (like person, organization, location). The original model placed a single classification network on top of the final contextual embeddings to assign a label (e.g., 'B-PER', 'I-PER', 'O') to each token. For the new task, they changed the labels to {'START', 'END', 'INSIDE', 'OUTSIDE'} to mark the answer span. However, the model struggles to identify valid spans, often predicting multiple 'START' tokens or 'END' tokens that don't correspond to a 'START'.

Based on the standard architectural pattern for this type of task, what is the primary conceptual flaw in the team's adapted approach, and what specific modification would better suit the goal of identifying a single, continuous answer span?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related