Learn Before
Fixed-Size Window Memory as a Form of Local Attention
A simple and effective method for creating a fixed-size memory component, denoted as , in attention mechanisms is to use a sliding window. This approach, which is a form of local attention, considers only a limited, constant number of the most recent key and value pairs. By restricting attention to this local neighborhood, the memory size is capped and prevented from growing with the input sequence length.

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fixed-Size Window Memory as a Form of Local Attention
Summary Vectors for Memory Compression in Attention
General Recurrent Formula for Memory Update
Comparison of Memory Storage in Window-based and Moving Average Caches
Hybrid Cache for Attention Mechanisms
An attention mechanism is designed to use a memory component that has a constant, fixed size, regardless of how long the input sequence becomes. What is the primary computational consequence of this design choice as the input sequence length increases significantly?
Computational Cost Scaling in Attention Mechanisms
Optimizing a Real-Time Sequence Processing Model
Learn After
Formula for Fixed-Size Window Memory
Window-based Cache as an Example of Fixed-Size Memory
Space Complexity of Sliding Window Attention
Window Size (n_c)
A language model is designed to process extremely long sequences of text, and its developers are concerned about computational resources. They are considering two approaches for the attention mechanism: one that considers all previous tokens in the sequence, and another that only considers a fixed-size window of the 100 most recent tokens. What is the fundamental trade-off between these two approaches?
Applying Sliding Window Attention
In an attention mechanism that uses a fixed-size sliding window, the amount of memory required to store the keys and values for the attention calculation increases as the input sequence gets longer.
Your team is documenting the memory subsystem of a...
You are reviewing two candidate memory designs for...
You’re deploying an internal LLM assistant that mu...
You’re designing an internal LLM feature that moni...
Post-Incident Review: Memory Design for Long-Running Customer Support Chats
Diagnosing Long-Range Failures in a Segment-Processed LLM with Dual Memory
Choosing a Memory Architecture for Long-Context Enterprise Summarization
Postmortem: Long-Document QA Failures Under Fixed-Window vs Compressive Memory
Selecting and Justifying a Long-Context Memory Design for a Regulated Audit Assistant
Incident Triage: Long-Running Agent Workflow with Windowed vs Compressive Memory