1Cademy - Decoding Network for KV Cache Generation

Learn Before

Prefilling Phase in Transformer Inference

Concept

Decoding Network for KV Cache Generation

The decoding network responsible for generating the Key-Value (KV) cache, denoted as $\mathrm{Dec}_{\mathrm{kv}}(\cdot)$ , shares the identical underlying architecture with the standard decoding network used for token prediction. The primary distinction lies in its output: instead of returning the standard output representations for tokens, this specialized network explicitly returns the multi-layered KV cache produced within the self-attention mechanisms during processing.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related