An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?
Memory-Efficient Cache Strategy Selection
Cache Suitability for High-Fidelity Tasks