Learn Before
Two-Segment Memory in Segment-Level Recurrence
A common implementation of segment-level recurrence involves a memory structure composed of two segments: the current segment and the immediately preceding one. In this setup, the attention mechanism at any given position can access the historical key-value pairs contained within these two most recent consecutive segments. This approach effectively creates a form of local memory and has been widely adopted in various segment-level recurrent models.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
FIFO Function as a Memory Update Example
Two-Segment Memory in Segment-Level Recurrence
Recurrent Memory Update using Segments
A language model is designed to process very long documents. Two memory update strategies are being considered. Strategy A updates the model's memory after processing each individual input unit. Strategy B updates the memory only after processing a block of 128 consecutive input units. What is the primary trade-off when choosing Strategy B over Strategy A?
A language model processes text by grouping it into non-overlapping blocks of 128 tokens. The model's memory is updated only after an entire block is processed. A developer observes that the model frequently fails to capture dependencies between the last word of one block and the first word of the very next block. What is the most direct cause of this specific issue?
Trade-offs in Memory Update Strategies
Optimizing a Language Model for Long Document Processing
Learn After
A language model processes a long document by dividing it into 10 equal, non-overlapping segments. To maintain context, the model's attention mechanism at any point can access information from the segment it is currently processing as well as the single segment that came immediately before it. If the model is currently processing Segment 6, which segments' information is available to its attention mechanism?
Analyzing Context Limitations in a Recurrent Model
Analyzing Memory Trade-offs in Segment-Level Recurrence
Compressive Transformer Memory Architecture