Learn Before
  • Multi-Query Attention (MQA)

Individual Attention Head Formula in Multi-Query Attention (MQA)

In Multi-Query Attention (MQA), the output for an individual head j is calculated using its unique query vector, qi[j]\mathbf{q}_i^{[j]}, while utilizing the Key and Value matrices, Ki\mathbf{K}_{\le i} and Vi\mathbf{V}_{\le i}, which are shared across all heads. This is represented by the formula: headj=Attqkv(qi[j],Ki,Vi)\text{head}_j = \text{Att}_{\text{qkv}}(\mathbf{q}_i^{[j]}, \mathbf{K}_{\le i}, \mathbf{V}_{\le i})

Image 0

0

1

13 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Individual Attention Head Formula in Multi-Query Attention (MQA)