1Cademy - Formula for Cache State Evolution during Autoregressive Decoding

Learn Before

Single-Step Autoregressive Generation with a Key-Value (KV) Cache

Formula

Formula for Cache State Evolution during Autoregressive Decoding

During the decoding phase of autoregressive generation, the Key-Value (KV) cache state corresponds to the sequence of tokens generated so far. The model uses this cache to predict the next token. This sequential process, where each prefix and its associated cache are used to generate the subsequent token, can be represented as: $x_0 \quad (\text{predicts } x_1) \Rightarrow \text{cache}_1$ $x_0x_1 \quad (\text{predicts } x_2) \Rightarrow \text{cache}_2$ $\dots$ $x_0x_1\dots x_{m-1} \quad (\text{predicts } x_m) \Rightarrow \text{cache}_m$ Here, $\text{cache}_m$ is the KV cache state containing the key-value pairs for the prefix $x_0x_1\dots x_{m-1}$ .

0

1

Updated 2026-06-21

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After