1Cademy - Compressive Transformer Memory Architecture

Learn Before

Multiple Memory Models in Attention
Hybrid Cache for Attention Mechanisms
Efficient and Compressed Memory Models
Two-Segment Memory in Segment-Level Recurrence

Concept

Compressive Transformer Memory Architecture

Segment-level memory models can be extended to utilize multiple memory components. The Compressive Transformer is a prime example of this architecture, employing two distinct, fixed-size memories to manage different historical contexts. It maintains a local memory, denoted by $\mathrm{Mem}$ , to capture recent context, alongside a secondary memory, denoted by $\mathrm{CMem}$ , which models and compresses older, long-term history. In this model, the Key-Value (KV) cache is formed by the combination of both $\mathrm{Mem}$ and $\mathrm{CMem}$ .