Formula for Number of Compressed Key-Value Pairs
In the Compressive Transformer, the number of key-value pairs after compression, denoted as , is calculated by dividing the number of key-value pairs popped from the local memory, , by a compression ratio, . This relationship is expressed by the formula:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Number of Compressed Key-Value Pairs
A language model is designed with a memory system where older key-value pairs from a primary, fixed-size memory buffer are processed by a network to create a smaller, summarized set of key-value pairs for long-term storage. Which statement best analyzes the fundamental trade-off when deciding how aggressively this network should summarize the information?
In a transformer model equipped with a two-tiered memory system, a batch of 50 key-value pairs representing older information is moved from the short-term memory. Before being stored in the long-term, compressed memory, this batch is processed by a dedicated compression network. Which of the following outcomes best describes the primary function of this compression network on the batch?
A transformer model uses a two-tiered memory system. When the short-term memory buffer is full, the oldest set of key-value pairs is moved to a long-term, compressed memory. Arrange the following events in the correct chronological order to describe this memory update process.
Learn After
Memory Compression Scenario
A transformer model with a compressive memory system needs to update its long-term storage. It takes 120 key-value pairs from its short-term memory and applies a compression function with a ratio of 4. How many new key-value pairs will be added to the long-term compressive memory after this operation?
Calculating Compression Ratio