Compression of Key-Value Pairs for Compressive Memory
During the update process in the Compressive Transformer, the key-value pairs that are popped from the primary memory () are not discarded. Instead, they are processed by a compression network, which compresses these key-value pairs into a smaller set of key-value pairs before they are added to the compressive memory ().

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Compression of Key-Value Pairs for Compressive Memory
FIFO Update of Compressive Memory
A long-context language model utilizes two distinct memory systems to manage information over time: a primary, fixed-size memory that holds recent, detailed information, and a secondary, compressed memory for older information. The primary memory operates by discarding its oldest entries to accommodate new data. Given this mechanism, what is the most direct source of information for updating the secondary, compressed memory?
A language model is designed with a two-tiered memory system to handle long documents. It has a fixed-size 'short-term memory' for recent, detailed information and a 'long-term memory' for older, summarized information. When a new segment of text is processed, arrange the following events in the correct chronological order to show how information flows between these two memory systems.
Relationship Between Memory Tiers in a Language Model
Learn After
Formula for Number of Compressed Key-Value Pairs
A language model is designed with a memory system where older key-value pairs from a primary, fixed-size memory buffer are processed by a network to create a smaller, summarized set of key-value pairs for long-term storage. Which statement best analyzes the fundamental trade-off when deciding how aggressively this network should summarize the information?
In a transformer model equipped with a two-tiered memory system, a batch of 50 key-value pairs representing older information is moved from the short-term memory. Before being stored in the long-term, compressed memory, this batch is processed by a dedicated compression network. Which of the following outcomes best describes the primary function of this compression network on the batch?
A transformer model uses a two-tiered memory system. When the short-term memory buffer is full, the oldest set of key-value pairs is moved to a long-term, compressed memory. Arrange the following events in the correct chronological order to describe this memory update process.