Example of a Moving Average-based Cache
A moving average-based cache compresses a window of recent key-value pairs into a single summary pair. As illustrated, the four most recent key vectors ( to ) and value vectors ( to ) are averaged independently. This process creates a memory component consisting of one summary key and one summary value, effectively reducing the memory size from four pairs to one (a size of 1x2). The specific calculations are:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Memory as a Moving Average of Keys and Values
Example of a Moving Average-based Cache
Cumulative Average of Keys and Values for Memory Component
Calculating a Memory Component Summary
When using a moving average of the last
nkey-value pairs to create a single summary vector for a memory component, what is the primary effect of significantly increasing the window sizen?Weighted Moving Average for Memory Component
A memory component in a transformer-based model is designed to create a summary by computing the simple, unweighted average of the last 10 key-value pairs. Which statement accurately describes a fundamental property of this specific summarization method?
Learn After
Calculating a Compressed Memory State
A system's memory component compresses the four most recent key-value pairs into a single summary pair by independently averaging their respective vectors. What is the most significant trade-off inherent in this compression technique?
A system uses a moving average-based cache that summarizes the four most recent key-value pairs into a single summary pair. The current summary is calculated from pairs at time steps
t-4,t-3,t-2, andt-1. When a new key-value pair arrives at time stept, how is the cache updated?