1Cademy - Hybrid Cache for Attention Mechanisms

Learn Before

Fixed-Size Memory for Constant Attention Cost

Concept

Hybrid Cache for Attention Mechanisms

A hybrid cache is a memory management strategy that combines two types of memory to efficiently handle long sequences. As illustrated in the diagram, it consists of a 'Local Memory' and a 'Compressed Memory'. The Local Memory (e.g., size 4x2) stores a fixed number of the most recent key-value pairs in their original, uncompressed form. As new data arrives, the oldest key-value pairs are evicted from the Local Memory. These evicted pairs are then passed through a compression function and stored in the Compressed Memory (e.g., size 2x2). This two-level approach allows a model to maintain high-fidelity information about the recent past while retaining a summarized, space-efficient representation of the more distant past.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

References

Learn Before

Related

Learn After