Formula

Layer-wise Structure of the KV Cache

The overall Key-Value (KV) cache generated by a Transformer's decoding network is a composite structure containing the individual KV caches from each of its layers. For a model with LL layers, the complete cache is represented as a collection of these layer-specific caches: cache={cache1,,cacheL}\text{cache} = \{\text{cache}^1, \dots, \text{cache}^L\} where cachei\text{cache}^i is the KV cache from the ii-th layer.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences