A transformer model uses a two-tiered memory system. When the short-term memory buffer is full, the oldest set of key-value pairs is moved to a long-term, compressed memory. Arrange the following events in the correct chronological order to describe this memory update process.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Number of Compressed Key-Value Pairs
A language model is designed with a memory system where older key-value pairs from a primary, fixed-size memory buffer are processed by a network to create a smaller, summarized set of key-value pairs for long-term storage. Which statement best analyzes the fundamental trade-off when deciding how aggressively this network should summarize the information?
In a transformer model equipped with a two-tiered memory system, a batch of 50 key-value pairs representing older information is moved from the short-term memory. Before being stored in the long-term, compressed memory, this batch is processed by a dedicated compression network. Which of the following outcomes best describes the primary function of this compression network on the batch?
A transformer model uses a two-tiered memory system. When the short-term memory buffer is full, the oldest set of key-value pairs is moved to a long-term, compressed memory. Arrange the following events in the correct chronological order to describe this memory update process.