Learn Before
Formula for KV Cache Memory Size
The memory footprint of the Key-Value (KV) cache for a specific context window size can be quantified. The total size is proportional to the product of four key parameters: the number of layers in the model (), the number of attention heads per layer (), the dimensionality of each head's key/value vectors (), and the size of the context window (). The overall memory complexity is therefore given by the formula: .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Reducing KV Cache Complexity via Windowed Caching
An engineer is deploying a large autoregressive model for a chatbot. They observe that as a conversation with a user gets longer, the model's memory consumption increases steadily, eventually leading to performance issues. This is because the model stores key and value vectors for every token in the conversation history to speed up the generation of the next token. Based on this mechanism, what is the fundamental relationship between the length of the conversation history (in tokens) and the amount of memory required for this storage?
KV Cache Memory Footprint Comparison
Calculating Memory Growth for Token Caching
Reducing KV Cache Complexity via Head Sharing
Formula for KV Cache Memory Size
Learn After
An autoregressive language model uses a key-value cache to store contextual information during text generation. A developer decides to double the maximum sequence length that the model can process. Assuming all other architectural parameters (such as the number of layers, number of attention heads, and the dimensionality of each head) remain constant, by what factor will the maximum memory required for the key-value cache change?
Optimizing KV Cache for a Chatbot Application
KV Cache Memory Calculation