1Cademy - Analysis of Attention Head Architectures

Learn Before

Individual Attention Head Formula in Multi-Query Attention (MQA)

Essay

Analysis of Attention Head Architectures

Imagine two different designs for a model's attention component. In 'Design 1', each attention head calculates its output using its own unique Query, Key, and Value vectors. In 'Design 2', each attention head still uses its own unique Query vector, but all heads must use a single, shared set of Key and Value vectors. Based on this information, analyze the fundamental difference in the inputs provided to a single attention head in 'Design 2' compared to 'Design 1'. What is the primary structural consequence of adopting 'Design 2'?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related