Activity (Process)

Updating the KV Cache

The procedure for updating the Key-Value (KV) cache at a given position is an essential operation during autoregressive sequence generation. Specifically, at a new position ii', the newly generated key vector (ki\mathbf{k}_{i'}) and value vector (vi\mathbf{v}_{i'}) are appended to their respective cache matrices, K\mathbf{K} and V\mathbf{V}. Using a function Append(a,b)\mathrm{Append}(\mathbf{a}, \mathbf{b}) that adds a row vector b\mathbf{b} to a matrix a\mathbf{a}, the update rule is defined as K=Append(K,ki)\mathbf{K} = \mathrm{Append}(\mathbf{K}, \mathbf{k}_{i'}) and V=Append(V,vi)\mathbf{V} = \mathrm{Append}(\mathbf{V}, \mathbf{v}_{i'}). This mechanism maintains a history of key-value pairs, enabling a Transformer decoder to attend to past context efficiently.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related