1Cademy - An engineer is analyzing the computational architecture of a large language model. They observe the following formula being used to calculate the output for an individual attention head `j` at a specific step `i`: <br><br>`head_j = Attention(q_i^[j], K_<=i, V_<=i)`<br><br>Based *only* on the components of this formula, what is the most accurate conclusion the engineer can draw about the relationship between the different attention heads in this layer?

Learn Before

Individual Attention Head Formula in Multi-Query Attention (MQA)

Multiple Choice

An engineer is analyzing the computational architecture of a large language model. They observe the following formula being used to calculate the output for an individual attention head j at a specific step i:

head_j = Attention(q_i^[j], K_<=i, V_<=i)

Based only on the components of this formula, what is the most accurate conclusion the engineer can draw about the relationship between the different attention heads in this layer?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related