1Cademy - Diagram of a Single-Step KV Cache Update and Attention

Learn Before

Updating the KV Cache

Example

Diagram of a Single-Step KV Cache Update and Attention

This diagram illustrates a single step of autoregressive generation, focusing on the update of the Key-Value (KV) cache at position i' + 1. An input at this new position undergoes linear transformations to generate a new query vector ( $\mathbf{q}_{i'+1}$ ), key vector ( $\mathbf{k}_{i'+1}$ ), and value vector ( $\mathbf{v}_{i'+1}$ ). The new key-value pair is then appended to the KV cache, which already contains the pairs for all preceding positions from 1 to i'. Finally, the new query $\mathbf{q}_{i'+1}$ performs an attention operation over the entire updated set of keys in the cache (from 1 to i' + 1) to compute the output for the current step.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After