Learn Before
Improved Multi-Head Attention Mechanism
Multi-Query Attention (MQA)
Multi-Query Attention (MQA) is an architectural refinement of the standard multi-head attention model designed for greater efficiency. In MQA, the Key (K) and Value (V) matrices are shared across all attention heads. This means that for a given step i, there is only a single set of keys and values, denoted as . However, each of the heads maintains its own distinct query projection, allowing different heads to learn unique focuses while being more computationally and memory efficient than standard multi-head attention.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Multi-Query Attention (MQA)
Grouped-Query Attention (GQA)
Cross-layer Multi-head Attention
Learn After
Individual Attention Head Formula in Multi-Query Attention (MQA)