Example

Diagram of a Single-Step KV Cache Update and Attention

This diagram illustrates a single step of autoregressive generation, focusing on the update of the Key-Value (KV) cache at position i' + 1. An input at this new position undergoes linear transformations to generate a new query vector (qi+1\mathbf{q}_{i'+1}), key vector (ki+1\mathbf{k}_{i'+1}), and value vector (vi+1\mathbf{v}_{i'+1}). The new key-value pair is then appended to the KV cache, which already contains the pairs for all preceding positions from 1 to i'. Finally, the new query qi+1\mathbf{q}_{i'+1} performs an attention operation over the entire updated set of keys in the cache (from 1 to i' + 1) to compute the output for the current step.

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences