Learn Before
A language model is designed to process extremely long sequences of text during inference. To manage computational resources, it is implemented with a key-value (KV) cache that has a fixed, limited size. What is the primary trade-off inherent in this specific implementation choice?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is designed to process extremely long sequences of text during inference. To manage computational resources, it is implemented with a key-value (KV) cache that has a fixed, limited size. What is the primary trade-off inherent in this specific implementation choice?
Optimizing a Conversational AI for Memory-Constrained Devices
Consequences of Bounded Memory in Text Summarization
Components of Fixed-Size KV Caches