Formula

Formula for Cache State Evolution during Autoregressive Decoding

During the decoding phase of autoregressive generation, the Key-Value (KV) cache state corresponds to the sequence of tokens generated so far. The model uses this cache to predict the next token. This sequential process, where each prefix and its associated cache are used to generate the subsequent token, can be represented as: x0(predicts x1)cache1x_0 \quad (\text{predicts } x_1) \Rightarrow \text{cache}_1 x0x1(predicts x2)cache2x_0x_1 \quad (\text{predicts } x_2) \Rightarrow \text{cache}_2 \dots x0x1xm1(predicts xm)cachemx_0x_1\dots x_{m-1} \quad (\text{predicts } x_m) \Rightarrow \text{cache}_m Here, cachem\text{cache}_m is the KV cache state containing the key-value pairs for the prefix x0x1xm1x_0x_1\dots x_{m-1}.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences