Formula

Attention Head Output in Grouped-Query Attention (GQA)

The output computation for a specific attention head jj in a Grouped-Query Attention (GQA) model depends on its assigned key-value group. If g(j)g(j) represents the group ID for the jj-th head, the head's output is calculated using the formula: headj=Attqkv(qi[j],Ki[g(j)],Vi[g(j)])\mathrm{head}_j = \mathrm{Att}_{\mathrm{qkv}}(\mathbf{q}_{i}^{[j]}, \mathbf{K}_{\le i}^{[g(j)]}, \mathbf{V}_{\le i}^{[g(j)]}) In this expression, the unique query vector qi[j]\mathbf{q}_{i}^{[j]} for the current token attends to the keys Ki[g(j)]\mathbf{K}_{\le i}^{[g(j)]} and values Vi[g(j)]\mathbf{V}_{\le i}^{[g(j)]} that are shared within its respective group g(j)g(j).

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related