Applying Grouped-Query Attention with Causal Masking
Using the provided case study and the attention formula, identify the specific set of Key and Value vectors that the query vector $\mathbf{q}_5^{[8]}$ will interact with. Justify your answer by explaining how the indices i, j, and the function g(j) from the formula determine this outcome.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A transformer model calculates the output for a single attention head
jat token positioniusing the formula:$\text{head}_j = \text{Att}_{\text{qkv}}(\mathbf{q}_i^{[j]}, \mathbf{K}_{\le i}^{[g(j)]}, \mathbf{V}_{\le i}^{[g(j)]})$, whereg(j)maps the query headjto a specific group. What is the primary consequence of using the group-indexed key$\mathbf{K}_{\le i}^{[g(j)]}$instead of a head-specific key$\mathbf{K}_{\le i}^{[j]}$?Applying Grouped-Query Attention with Causal Masking
Deconstructing the GQA Formula with Causal Masking