Learn Before
Concept

Sets of Keys and Values in Grouped-Query Attention (GQA)

In Grouped-Query Attention (GQA), the available attention heads are partitioned into ngn_g distinct groups. For each group, all of its constituent heads share the exact same key and value vectors. Consequently, the attention layer maintains ngn_g distinct sets of keys and values, which can be denoted as {(K≤i[1],V≤i[1]),…,(K≤i[ng],V≤i[ng])}\{(\mathbf{K}_{\le i}^{[1]},\mathbf{V}_{\le i}^{[1]}), \dots, (\mathbf{K}_{\le i}^{[n_g]},\mathbf{V}_{\le i}^{[n_g]})\}. The function g(j)g(j) is utilized to identify the specific group assigned to the jj-th head.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related