Learn Before
Trade-offs in Memory Update Strategies
A language model is designed to process a 10,000-token document. Instead of updating its memory state after each of the 10,000 tokens, it is configured to update its memory only after processing chunks of 250 tokens. Explain the primary computational advantage of this approach and a potential drawback related to information flow.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
FIFO Function as a Memory Update Example
Two-Segment Memory in Segment-Level Recurrence
Recurrent Memory Update using Segments
A language model is designed to process very long documents. Two memory update strategies are being considered. Strategy A updates the model's memory after processing each individual input unit. Strategy B updates the memory only after processing a block of 128 consecutive input units. What is the primary trade-off when choosing Strategy B over Strategy A?
A language model processes text by grouping it into non-overlapping blocks of 128 tokens. The model's memory is updated only after an entire block is processed. A developer observes that the model frequently fails to capture dependencies between the last word of one block and the first word of the very next block. What is the most direct cause of this specific issue?
Trade-offs in Memory Update Strategies
Optimizing a Language Model for Long Document Processing