1Cademy - Summary Vectors for Memory Compression in Attention

Approach 1: The memory stores the raw key-value pairs from the 256 most recent positions in the sequence.
Approach 2: The memory is a pair of fixed-size &#x27;summary&#x27; vectors, which are calculated by mathematically combining all preceding key-value pairs into a single, condensed representation.

Learn Before

Fixed-Size Memory for Constant Attention Cost

Concept

Summary Vectors for Memory Compression in Attention

An alternative to using a sliding window for the memory component (Mem) is to define it as a pair of summary vectors. This approach creates a more compressed representation of the sequence's history, rather than storing a subset of the raw key-value pairs.