Formula

Space Complexity of the KV Cache

During inference, the space complexity of the Key-Value (KV) cache is directly proportional to the number of tokens for which keys and values are stored. This relationship is captured by the formula O(Lτdhm)O(L \cdot \tau \cdot d_h \cdot m), where LL is the number of layers, τ\tau is the number of attention heads, dhd_h is the head dimension, and mm is the number of tokens being cached.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related