Activity (Process)

Single-Step Generation with a KV Cache

During each step i of autoregressive generation, the model computes a new query (qiq_i), key (kik_i), and value (viv_i) vector from the current input token. The new key-value pair (ki,vik_i, v_i) is appended to the KV Cache, which holds the pairs for all preceding tokens. The attention operation is then performed using the new query qiq_i and the complete set of keys and values stored in the cache up to the current step, denoted as KiK_{\leq i} and ViV_{\leq i}. This process generates the output for step i by allowing the current token to attend to itself and all previous tokens in the sequence.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Related