1Cademy - Reusability of Key-Value Pairs in Autoregressive Inference

Learn Before

Causal Attention Input Structure

Concept

Reusability of Key-Value Pairs in Autoregressive Inference

During autoregressive inference, once the key and value vectors for a specific token are computed, they remain constant and are reused in all subsequent generation steps. For example, when generating the i-th token, the model attends to the key-value pairs of all preceding tokens (0 to i-1). These same pairs will be needed again when generating the (i+1)-th token, along with the newly generated pair for token i. This repeated usage makes re-computation inefficient and provides the primary motivation for the KV cache.

Updated 2025-10-07

Contributors are: