Learn Before
Concept

Reusability of Key-Value Pairs in Autoregressive Inference

During autoregressive inference, once the key and value vectors for a specific token are computed, they remain constant and are reused in all subsequent generation steps. For example, when generating the i-th token, the model attends to the key-value pairs of all preceding tokens (0 to i-1). These same pairs will be needed again when generating the (i+1)-th token, along with the newly generated pair for token i. This repeated usage makes re-computation inefficient and provides the primary motivation for the KV cache.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences