1Cademy - Layer-wise Structure of the KV Cache

Learn Before

Formula for KV Cache Prefilling

Formula

Layer-wise Structure of the KV Cache

The overall Key-Value (KV) cache generated by a Transformer's decoding network is a composite structure containing the individual KV caches from each of its layers. For a model with $L$ layers, the complete cache is represented as a collection of these layer-specific caches: $\text{cache} = \{\text{cache}^1, \dots, \text{cache}^L\}$ where $\text{cache}^i$ is the KV cache from the $i$ -th layer.