1Cademy - KV Cache Memory Calculation

Learn Before

Formula for KV Cache Memory Size

Short Answer

KV Cache Memory Calculation

An autoregressive language model uses a cache to store key and value vectors for each token in the context window, across all attention heads and layers. This speeds up the generation of subsequent tokens. Given the following parameters for a specific model, calculate the total number of individual floating-point values that would be stored in this cache when it is completely full.

Number of layers: 32
Number of attention heads per layer: 12
Dimensionality of each head's key/value vector: 64
Context window size: 2048 tokens

Provide only the final numerical answer.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related