Learn Before
  • Improved Multi-Head Attention Mechanism

Multi-Query Attention (MQA)

Multi-Query Attention (MQA) is an architectural refinement of the standard multi-head attention model designed for greater efficiency. In MQA, the Key (K) and Value (V) matrices are shared across all attention heads. This means that for a given step i, there is only a single set of keys and values, denoted as (Ki,Vi)(\mathbf{K}_{\le i}, \mathbf{V}_{\le i}). However, each of the τ\tau heads maintains its own distinct query projection, allowing different heads to learn unique focuses while being more computationally and memory efficient than standard multi-head attention.

0

1

7 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Multi-Query Attention (MQA)

  • Grouped-Query Attention (GQA)

  • Cross-layer Multi-head Attention

Learn After
  • Individual Attention Head Formula in Multi-Query Attention (MQA)