1Cademy - Formula for KV Cache Prefilling

Learn Before

Prefilling Phase in Transformer Inference
Core Components of a Transformer Decoding Network

Formula

Formula for KV Cache Prefilling

The prefilling of the Key-Value (KV) cache, a preparatory step for autoregressive inference, is represented by the formula: $\text{cache} = \text{Dec}_{\text{kv}}(\mathbf{x})$ In this equation, $\text{Dec}_{\text{kv}}(\cdot)$ represents the LLM's decoding network, which is architecturally identical to the standard decoding network, $\text{Dec}(\cdot)$ . The key distinction is that $\text{Dec}_{\text{kv}}(\cdot)$ is configured to output the KV cache from its self-attention layers, rather than the final token representations, effectively storing the key-value pairs for the entire input sequence, $\mathbf{x}$ .