1Cademy - In a Multi-Query Attention (MQA) layer, all attention heads share the same Key and Value matrices. The formula for the output of a single, specific head `j` at step `i` is given as: `head_j = Att_qkv(______, K_<=i, V_<=i)`. What component correctly fills the blank to represent the unique input for this specific head?

Learn Before

Individual Attention Head Formula in Multi-Query Attention (MQA)

Fill in the Blank

In a Multi-Query Attention (MQA) layer, all attention heads share the same Key and Value matrices. The formula for the output of a single, specific head j at step i is given as: head_j = Att_qkv(______, K_<=i, V_<=i). What component correctly fills the blank to represent the unique input for this specific head?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related