Window-based Cache as an Example of Fixed-Size Memory
A window-based cache is a practical implementation of fixed-size memory. It operates by storing a set number of the most recent key-value pairs from a sequence. For instance, a cache of size four would retain the key-value pairs from the four preceding time steps, providing a localized context for the model.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Fixed-Size Window Memory
Window-based Cache as an Example of Fixed-Size Memory
Space Complexity of Sliding Window Attention
Window Size (n_c)
A language model is designed to process extremely long sequences of text, and its developers are concerned about computational resources. They are considering two approaches for the attention mechanism: one that considers all previous tokens in the sequence, and another that only considers a fixed-size window of the 100 most recent tokens. What is the fundamental trade-off between these two approaches?
Applying Sliding Window Attention
In an attention mechanism that uses a fixed-size sliding window, the amount of memory required to store the keys and values for the attention calculation increases as the input sequence gets longer.
Your team is documenting the memory subsystem of a...
You are reviewing two candidate memory designs for...
You’re deploying an internal LLM assistant that mu...
You’re designing an internal LLM feature that moni...
Post-Incident Review: Memory Design for Long-Running Customer Support Chats
Diagnosing Long-Range Failures in a Segment-Processed LLM with Dual Memory
Choosing a Memory Architecture for Long-Context Enterprise Summarization
Postmortem: Long-Document QA Failures Under Fixed-Window vs Compressive Memory
Selecting and Justifying a Long-Context Memory Design for a Regulated Audit Assistant
Incident Triage: Long-Running Agent Workflow with Windowed vs Compressive Memory
Learn After
Example of a Window-based Cache
A system processes a sequence of data, generating a new key-value pair at each time step. This system uses a memory component that is designed to store only the three most recent key-value pairs. If the system has just processed the fifth item in a sequence, generating the pair (K5, V5), which of the following sets of pairs would be stored in the memory component?
Diagnosing Memory System Limitations
A system uses a memory component that stores only the three most recent data items it has processed. The system processes the following sequence of items one by one: A, B, C, D, E. The states below represent the contents of the memory component at different points in time. Arrange these states in the chronological order in which they would occur.