Formula

Formula for KV Cache Memory Size

The memory footprint of the Key-Value (KV) cache for a specific context window size can be quantified. The total size is proportional to the product of four key parameters: the number of layers in the model (LL), the number of attention heads per layer (τ\tau), the dimensionality of each head's key/value vectors (dhd_h), and the size of the context window (mwm_w). The overall memory complexity is therefore given by the formula: O(Lτdhmw)O(L \cdot \tau \cdot d_h \cdot m_w).

Image 0

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences