Learn Before
Formula

KV Cache Size in Multi-Query Attention

In Multi-Query Attention (MQA), keys and values are shared across all attention heads rather than being duplicated for each head. Because of this sharing, the memory footprint of the Key-Value (KV) cache is significantly reduced compared to standard multi-head attention. The size of the KV cache in MQA is given by the complexity formula O(Ldhm)O(L \cdot d_h \cdot m), reflecting the removal of the head count multiplier.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences