Learn Before
Comparison

GQA as an Interpolation Between MHA and MQA

Grouped-Query Attention (GQA) provides a flexible framework that interpolates between standard multi-head attention and Multi-Query Attention (MQA), allowing for a direct trade-off between model expressiveness and computational efficiency. This trade-off is controlled by adjusting the number of key-value groups, ngn_g. When ng=τn_g = \tau, the model becomes the standard multi-head attention model. By contrast, when ng=1n_g = 1, it becomes the GQA model.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related