Definition

Key-Value Cache

In autoregressive transformer models, a Key-Value (KV) cache is a mechanism used to optimize the generation process during inference. It stores the computed key and value matrices from all previous steps, so they do not need to be recomputed for each new token being generated. Because the model performs attention on a group of feature sub-spaces in parallel, the KV cache must retain the keys and values for all τ\tau attention heads. The KV cache at generation step ii can be represented as the set of historical keys and values across these heads: {(Ki[1],Vi[1]),,(Ki[τ],Vi[τ])}\left\{ (\mathbf{K}_{\leq i}^{[1]}, \mathbf{V}_{\leq i}^{[1]}), \dots, (\mathbf{K}_{\leq i}^{[\tau]}, \mathbf{V}_{\leq i}^{[\tau]}) \right\}.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related