Learn Before
Improved Multi-Head Attention Mechanism
Grouped-Query Attention (GQA)
Grouped-Query Attention (GQA) is an attention mechanism that offers a balance between the computational efficiency of Multi-Query Attention (MQA) and the model expressiveness of standard Multi-Head Attention (MHA). It works by grouping query heads and having each group share a single Key (K) and Value (V) projection. The number of groups, denoted as ng
, is an adjustable parameter that allows for a trade-off between performance and model quality.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Multi-Query Attention (MQA)
Grouped-Query Attention (GQA)
Cross-layer Multi-head Attention
Learn After
Relationship between GQA, MHA, and MQA