Learn Before
  • Improved Multi-Head Attention Mechanism

Grouped-Query Attention (GQA)

Grouped-Query Attention (GQA) is an attention mechanism that offers a balance between the computational efficiency of Multi-Query Attention (MQA) and the model expressiveness of standard Multi-Head Attention (MHA). It works by grouping query heads and having each group share a single Key (K) and Value (V) projection. The number of groups, denoted as ng, is an adjustable parameter that allows for a trade-off between performance and model quality.

0

1

7 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Multi-Query Attention (MQA)

  • Grouped-Query Attention (GQA)

  • Cross-layer Multi-head Attention

Learn After
  • Relationship between GQA, MHA, and MQA