1Cademy - Attention Head Output in Grouped-Query Attention (GQA)

Learn Before

Set of Indexed Key-Value Pairs
Grouped-Query Attention (GQA)

Formula

Attention Head Output in Grouped-Query Attention (GQA)

The output computation for a specific attention head $j$ in a Grouped-Query Attention (GQA) model depends on its assigned key-value group. If $g(j)$ represents the group ID for the $j$ -th head, the head's output is calculated using the formula: $\mathrm{head}_j = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_{i}^{[j]}, \mathbf{K}_{\le i}^{[g(j)]}, \mathbf{V}_{\le i}^{[g(j)]})$ In this expression, the unique query vector $\mathbf{q}_{i}^{[j]}$ for the current token attends to the keys $\mathbf{K}_{\le i}^{[g(j)]}$ and values $\mathbf{V}_{\le i}^{[g(j)]}$ that are shared within its respective group $g(j)$ .