1Cademy - KV Cache Size in Grouped-Query Attention (GQA)

Learn Before

Grouped-Query Attention (GQA)

Formula

KV Cache Size in Grouped-Query Attention (GQA)

The memory size required for the Key-Value (KV) cache in a Grouped-Query Attention (GQA) model is determined by the complexity formula $O(L \cdot n_g \cdot d_h \cdot m)$ . Because the size depends directly on the number of shared key-value groups, denoted as $n_g$ , adjusting this parameter allows for a trade-off between computational efficiency and model expressiveness. Specifically, when $n_g = \tau$ , the architecture operates as a standard multi-head attention model, whereas setting $n_g = 1$ configures it as the GQA model.

Updated 2026-04-23

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related