Comparison

Comparison of Memory Storage in Window-based and Moving Average Caches

Window-based and moving average-based caches offer different approaches to storing historical key-value pairs for attention mechanisms. A window-based cache directly stores a fixed number of recent pairs; for instance, a window of four pairs results in a memory size of 4x2. In contrast, a moving average-based cache compresses the same four pairs into a single summary pair by averaging the keys and values independently. This compression reduces the memory size to a constant 1x2, providing a more memory-efficient representation.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences