Formula

Formula for Prefix Cache State Generation

During the prefilling phase for an input sequence x\mathbf{x}, we generate a sequence of prefixes and their corresponding Key-Value (KV) cache states. This mapping is defined as:

x0 (x<1)cache<1x0x1 (x<2)cache<2...x0x1...xm1 (x<m)cache<m\begin{matrix} x_0\ (\mathbf{x}_{<1}) & \Rightarrow & \mathrm{cache}_{<1} \\ x_0 x_1\ (\mathbf{x}_{<2}) & \Rightarrow & \mathrm{cache}_{<2} \\ & ... & \\ x_0 x_1 ... x_{m-1}\ (\mathbf{x}_{<m}) & \Rightarrow & \mathrm{cache}_{<m} \end{matrix}

where cache<i\mathrm{cache}_{<i} denotes the KV cache state for the prefix x<i\mathbf{x}_{<i}. All these mappings can be stored in the prefix cache for efficient reuse.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences