1Cademy - In a specific attention mechanism, there are 8 query heads (indexed j=1 to 8) and 2 distinct Key-Value (KV) groups (indexed g=1 to 2). Query heads 1 through 4 are assigned to KV group 1, while query heads 5 through 8 are assigned to KV group 2. The output for a given query head `j` is calculated based on its own query vector `q^[j]` and the Key-Value pair from its assigned group, `(K^[g(j)], V^[g(j)])`. Which Key-Value pair will query head 6 use for its computation?

Learn Before

Attention Head Output in Grouped-Query Attention (GQA)

Multiple Choice

In a specific attention mechanism, there are 8 query heads (indexed j=1 to 8) and 2 distinct Key-Value (KV) groups (indexed g=1 to 2). Query heads 1 through 4 are assigned to KV group 1, while query heads 5 through 8 are assigned to KV group 2. The output for a given query head j is calculated based on its own query vector q^[j] and the Key-Value pair from its assigned group, (K^[g(j)], V^[g(j)]). Which Key-Value pair will query head 6 use for its computation?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related