Case Study

Debugging a Question-Answering Model Architecture

A team is building a model to extract answers from documents. They adapt an existing architecture that was successful for identifying named entities (like person, organization, location). The original model placed a single classification network on top of the final contextual embeddings to assign a label (e.g., 'B-PER', 'I-PER', 'O') to each token. For the new task, they changed the labels to {'START', 'END', 'INSIDE', 'OUTSIDE'} to mark the answer span. However, the model struggles to identify valid spans, often predicting multiple 'START' tokens or 'END' tokens that don't correspond to a 'START'.

Based on the standard architectural pattern for this type of task, what is the primary conceptual flaw in the team's adapted approach, and what specific modification would better suit the goal of identifying a single, continuous answer span?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science