Example

Example of a Moving Average-based Cache

A moving average-based cache compresses a window of recent key-value pairs into a single summary pair. As illustrated, the four most recent key vectors (ki3\mathbf{k}_{i-3} to ki\mathbf{k}_{i}) and value vectors (vi3\mathbf{v}_{i-3} to vi\mathbf{v}_{i}) are averaged independently. This process creates a memory component consisting of one summary key and one summary value, effectively reducing the memory size from four pairs to one (a size of 1x2). The specific calculations are: Memory Key=ki3+ki2+ki1+ki4\text{Memory Key} = \frac{\mathbf{k}_{i-3}+\mathbf{k}_{i-2}+\mathbf{k}_{i-1}+\mathbf{k}_{i}}{4} Memory Value=vi3+vi2+vi1+vi4\text{Memory Value} = \frac{\mathbf{v}_{i-3}+\mathbf{v}_{i-2}+\mathbf{v}_{i-1}+\mathbf{v}_{i}}{4}

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences