Learn Before
Formula

KV Cache Size in Grouped-Query Attention (GQA)

The memory size required for the Key-Value (KV) cache in a Grouped-Query Attention (GQA) model is determined by the complexity formula O(L⋅ng⋅dh⋅m)O(L \cdot n_g \cdot d_h \cdot m). Because the size depends directly on the number of shared key-value groups, denoted as ngn_g, adjusting this parameter allows for a trade-off between computational efficiency and model expressiveness. Specifically, when ng=τn_g = \tau, the architecture operates as a standard multi-head attention model, whereas setting ng=1n_g = 1 configures it as the GQA model.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related