Architectural Choice for a Long-Document Q&A System
An AI development team is building a system to answer highly specific questions about lengthy legal documents. Their initial model, which processes every previous token, is too slow and memory-intensive. They are considering two alternative approaches:
- Approach 1: Implement a mechanism where each new token only computes relationships with a limited, strategically selected subset of important previous tokens from across the entire document.
- Approach 2: Implement a separate component that periodically reads a chunk of the oldest token information and compresses it into a single, fixed-size summary representation, which is then made available for processing.
For the task of answering highly specific questions that may depend on precise details from the beginning of a long document, which approach is more suitable? Justify your reasoning by explaining the primary risk associated with the less suitable approach in this context.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is developing a language model designed to process extremely long sequences, but they are constrained by the computational cost of storing and attending to every previous token's key-value pair. They are evaluating two distinct architectural solutions:
- Solution A: Modify the attention mechanism itself so that each token only attends to a strategically chosen subset of previous tokens, rather than all of them.
- Solution B: Introduce a separate, fixed-size data structure that periodically summarizes and compresses the key-value pairs from older tokens into a condensed representation.
Which statement best analyzes the fundamental difference in how these two solutions address the long-sequence problem?
Architectural Trade-offs for Long-Context Summarization
Architectural Choice for a Long-Document Q&A System