Learn Before
In an autoregressive attention mechanism, a sequence of key vectors is generated. Given the first three key vectors k_0 = [1, 2], k_1 = [3, 4], and k_2 = [5, 6], which of the following matrices represents the complete set of keys that the query at position i=2 is allowed to interact with?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Causal Attention Input Structure
Enumeration of Dot Products in Causal Self-Attention
State Variables in Linear Attention (μ_i, ν_i)
In an autoregressive attention mechanism, a sequence of key vectors is generated. Given the first three key vectors
k_0 = [1, 2],k_1 = [3, 4], andk_2 = [5, 6], which of the following matrices represents the complete set of keys that the query at positioni=2is allowed to interact with?Debugging a Causal Attention Implementation
In an autoregressive attention mechanism processing a sequence of 10 tokens (indexed 0 to 9), the matrix of key vectors used to compute the output for the token at position 3 is identical to the matrix of key vectors used for the token at position 7.