1Cademy - GQA as an Interpolation Between MHA and MQA

Learn Before

Grouped-Query Attention (GQA)

Comparison

GQA as an Interpolation Between MHA and MQA

Grouped-Query Attention (GQA) provides a flexible framework that interpolates between standard multi-head attention and Multi-Query Attention (MQA), allowing for a direct trade-off between model expressiveness and computational efficiency. This trade-off is controlled by adjusting the number of key-value groups, $n_g$ . When $n_g = \tau$ , the model becomes the standard multi-head attention model. By contrast, when $n_g = 1$ , it becomes the GQA model.