Learn Before
Comparison of Memory Storage in Window-based and Moving Average Caches
Window-based and moving average-based caches offer different approaches to storing historical key-value pairs for attention mechanisms. A window-based cache directly stores a fixed number of recent pairs; for instance, a window of four pairs results in a memory size of 4x2. In contrast, a moving average-based cache compresses the same four pairs into a single summary pair by averaging the keys and values independently. This compression reduces the memory size to a constant 1x2, providing a more memory-efficient representation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fixed-Size Window Memory as a Form of Local Attention
Summary Vectors for Memory Compression in Attention
General Recurrent Formula for Memory Update
Comparison of Memory Storage in Window-based and Moving Average Caches
Hybrid Cache for Attention Mechanisms
An attention mechanism is designed to use a memory component that has a constant, fixed size, regardless of how long the input sequence becomes. What is the primary computational consequence of this design choice as the input sequence length increases significantly?
Computational Cost Scaling in Attention Mechanisms
Optimizing a Real-Time Sequence Processing Model
Learn After
An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?
Memory-Efficient Cache Strategy Selection
Cache Suitability for High-Fidelity Tasks