1Cademy - Compression of Key-Value Pairs for Compressive Memory

Learn Before

Compressive Memory Update in Compressive Transformer

Concept

Compression of Key-Value Pairs for Compressive Memory

During the update process in the Compressive Transformer, the $n_s$ key-value pairs that are popped from the primary memory ( $\mathrm{Mem}$ ) are not discarded. Instead, they are processed by a compression network, which compresses these $n_s$ key-value pairs into a smaller set of $\frac{n_s}{c}$ key-value pairs before they are added to the compressive memory ( $\mathrm{CMem}$ ).

Updated 2026-04-23

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Formula for Number of Compressed Key-Value Pairs
A language model is designed with a memory system where older key-value pairs from a primary, fixed-size memory buffer are processed by a network to create a smaller, summarized set of key-value pairs for long-term storage. Which statement best analyzes the fundamental trade-off when deciding how aggressively this network should summarize the information?
In a transformer model equipped with a two-tiered memory system, a batch of 50 key-value pairs representing older information is moved from the short-term memory. Before being stored in the long-term, compressed memory, this batch is processed by a dedicated compression network. Which of the following outcomes best describes the primary function of this compression network on the batch?
A transformer model uses a two-tiered memory system. When the short-term memory buffer is full, the oldest set of key-value pairs is moved to a long-term, compressed memory. Arrange the following events in the correct chronological order to describe this memory update process.

Learn Before

Related

Learn After