Concept

Grouped-Query Attention (GQA)

Grouped-Query Attention (GQA) is an attention mechanism that serves as a natural extension to standard multi-head attention and Multi-Query Attention (MQA). In GQA, the available attention heads are partitioned into ngn_g distinct groups, where each group corresponds to a shared set of keys and values. This grouping approach offers a balance between model expressiveness and computational efficiency by reducing the total number of key-value projections required.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After