Example

Example of a Single-Step KV Cache Update

The update mechanism for a Key-Value (KV) cache during a single step of autoregressive generation can be illustrated as follows. Initially, the cache contains the key-value pairs for all preceding positions, from 1 to i'. When the input for the next position, i' + 1, is processed, it undergoes linear transformations to generate a new query vector (qi+1\mathbf{q}_{i'+1}), key vector (ki+1\mathbf{k}_{i'+1}), and value vector (vi+1\mathbf{v}_{i'+1}). The newly generated key-value pair is then appended to the cache. Subsequently, the new query vector performs an attention operation over the entire, now updated, set of keys from position 1 to i' + 1.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences