Case Study

Architectural Choice for a Long-Document Q&A System

An AI development team is building a system to answer highly specific questions about lengthy legal documents. Their initial model, which processes every previous token, is too slow and memory-intensive. They are considering two alternative approaches:

  1. Approach 1: Implement a mechanism where each new token only computes relationships with a limited, strategically selected subset of important previous tokens from across the entire document.
  2. Approach 2: Implement a separate component that periodically reads a chunk of the oldest token information and compresses it into a single, fixed-size summary representation, which is then made available for processing.

For the task of answering highly specific questions that may depend on precise details from the beginning of a long document, which approach is more suitable? Justify your reasoning by explaining the primary risk associated with the less suitable approach in this context.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science