Formula

Formula for KV Cache Prefilling

The prefilling of the Key-Value (KV) cache, a preparatory step for autoregressive inference, is represented by the formula: cache=Deckv(x)\text{cache} = \text{Dec}_{\text{kv}}(\mathbf{x}) In this equation, Deckv()\text{Dec}_{\text{kv}}(\cdot) represents the LLM's decoding network, which is architecturally identical to the standard decoding network, Dec()\text{Dec}(\cdot). The key distinction is that Deckv()\text{Dec}_{\text{kv}}(\cdot) is configured to output the KV cache from its self-attention layers, rather than the final token representations, effectively storing the key-value pairs for the entire input sequence, x\mathbf{x}.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related