Learn Before
Key Matrix for Causal Attention (K_≤i)
In causal or autoregressive attention mechanisms, the key matrix for a given position , denoted as , is formed by vertically stacking all key vectors from the beginning of the sequence up to and including position . This matrix represents the set of all keys that the query at position is allowed to attend to. It is defined as:
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Set of Sequential Key-Value Pairs
Let a sequence of vectors be constructed where the first element is and the second element is . The third element has multiple potential versions, and the 5th version is given as . According to the notational definition , what is the specific sequence represented by when using the 5th version of the 3rd element?
Key Matrix for Causal Attention (K_≤i)
Deconstructing Vector Prefix Notation
Key-Value Cache
Consider a sequence of vectors represented as . The notation represents the subsequence containing only the first two vectors, .
Learn After
Causal Attention Input Structure
Enumeration of Dot Products in Causal Self-Attention
State Variables in Linear Attention (μ_i, ν_i)
In an autoregressive attention mechanism, a sequence of key vectors is generated. Given the first three key vectors
k_0 = [1, 2],k_1 = [3, 4], andk_2 = [5, 6], which of the following matrices represents the complete set of keys that the query at positioni=2is allowed to interact with?Debugging a Causal Attention Implementation
In an autoregressive attention mechanism processing a sequence of 10 tokens (indexed 0 to 9), the matrix of key vectors used to compute the output for the token at position 3 is identical to the matrix of key vectors used for the token at position 7.