1Cademy - Consider an attention mechanism where the output for a head `j` is computed by the formula `head_j = Att_qkv(q_i^[j], K_<=i^[g(j)], V_<=i^[g(j)])`. In this setup, `q_i^[j]` is a query vector unique to head `j`, while the function `g(j)` maps head `j` to a potentially shared key-value group. <br><br>Statement: If two distinct query heads, `j1` and `j2`, are mapped to the same key-value group (meaning `g(j1) = g(j2)`), their final output vectors, `head_j1` and `head

Learn Before

Attention Head Output in Grouped-Query Attention (GQA)

True/False

Consider an attention mechanism where the output for a head j is computed by the formula head_j = Att_qkv(q_i^[j], K_<=i^[g(j)], V_<=i^[g(j)]). In this setup, q_i^[j] is a query vector unique to head j, while the function g(j) maps head j to a potentially shared key-value group.

Statement: If two distinct query heads, j1 and j2, are mapped to the same key-value group (meaning g(j1) = g(j2)), their final output vectors, head_j1 and head_j2, will necessarily be identical.