Learn Before
Components of Fixed-Size KV Caches
In Large Language Models (LLMs), fixed-size KV caches optimize memory by managing different sets of keys and values. This includes the keys and values dynamically generated during active inference, those preserved in the model's primary memory, and those stored or encoded in a compressed memory to retain older contextual information without exceeding the fixed memory capacity.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A language model is designed to process extremely long sequences of text during inference. To manage computational resources, it is implemented with a key-value (KV) cache that has a fixed, limited size. What is the primary trade-off inherent in this specific implementation choice?
Optimizing a Conversational AI for Memory-Constrained Devices
Consequences of Bounded Memory in Text Summarization
Components of Fixed-Size KV Caches