Learn Before
Activity (Process)

Process of Utilizing a Prefix Cache

When processing a new input sequence x\mathbf{x}', the system checks if it shares a common prefix with a previously cached sequence. If the new input has a matching prefix x<k=x<k\mathbf{x}'_{<k} = \mathbf{x}_{<k} for some length kk, the corresponding Key-Value (KV) cache state, cache<k\mathrm{cache}_{<k}, is loaded directly. This state is used to initialize the KV cache, allowing the model to bypass redundant computation and only compute the hidden states for the remaining subsequent tokens xk\mathbf{x}'_{\ge k}.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related