Memory-Efficient Cache Strategy Selection
Based on the primary constraint of limited memory, which caching method should the engineer choose? Justify your answer by explaining the fundamental difference in memory usage between the two approaches.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?
Memory-Efficient Cache Strategy Selection
Cache Suitability for High-Fidelity Tasks