1Cademy - Key-Value Cache

Learn Before

Definition

Key-Value Cache

In autoregressive transformer models, a Key-Value (KV) cache is a mechanism used to optimize the generation process during inference. It stores the computed key and value matrices from all previous steps, so they do not need to be recomputed for each new token being generated. Because the model performs attention on a group of feature sub-spaces in parallel, the KV cache must retain the keys and values for all $\tau$ attention heads. The KV cache at generation step $i$ can be represented as the set of historical keys and values across these heads: $\left\{ (\mathbf{K}_{\leq i}^{[1]}, \mathbf{V}_{\leq i}^{[1]}), \dots, (\mathbf{K}_{\leq i}^{[\tau]}, \mathbf{V}_{\leq i}^{[\tau]}) \right\}$ .

Updated 2026-04-23

Contributors are:

Who are from:

References

Learn Before

Related

Learn After