Learn Before
Debugging a Question-Answering Model Architecture
A team is building a model to extract answers from documents. They adapt an existing architecture that was successful for identifying named entities (like person, organization, location). The original model placed a single classification network on top of the final contextual embeddings to assign a label (e.g., 'B-PER', 'I-PER', 'O') to each token. For the new task, they changed the labels to {'START', 'END', 'INSIDE', 'OUTSIDE'} to mark the answer span. However, the model struggles to identify valid spans, often predicting multiple 'START' tokens or 'END' tokens that don't correspond to a 'START'.
Based on the standard architectural pattern for this type of task, what is the primary conceptual flaw in the team's adapted approach, and what specific modification would better suit the goal of identifying a single, continuous answer span?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Span Prediction Loss Function
Inference for Span Prediction
Illustration of BERT-based Architecture for Span Prediction
Input Sequence Formatting for Span Prediction
Applying Prediction Networks to Context Token Outputs
An engineer is designing a model to extract answers from a paragraph. The model must identify a continuous segment of text (a 'span') that answers a given question. The model's base component processes the input and produces a contextualized vector representation for each token in the paragraph. Considering the task is to identify the start and end points of the answer, which of the following architectural designs for the final prediction layer is most appropriate?
Debugging a Question-Answering Model Architecture
Comparing Model Architectures for Text Extraction Tasks