Explicit Enumeration of Causal Self-Attention Dot Products
In a causal self-attention mechanism, a query at a given position i can only interact with keys at positions j where . For a sequence of length 7 (indexed 0 to 6), this results in the following explicit set of query-key dot products being computed:
- :
- :
- :
- :
- :
- :
- :
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Explicit Enumeration of Causal Self-Attention Dot Products
An autoregressive model processes a sequence of tokens, where the query for a given token
i(denoted ) can only interact with key vectors from positionsjwherej ≤ i. For the 4th token in a sequence (indexed as ), which of the following dot product computations would not be performed?In a self-attention mechanism where the prediction for a token at position
ican only depend on tokens from positions 0 up to and includingi, what is the total number of query-key dot products computed for an entire input sequence of 5 tokens (indexed 0 to 4)?An autoregressive model processes a sequence of tokens, where the query for a given token
i(denoted as ) can only interact with key vectors from positionsjwherej ≤ i. Match each query vector with the complete set of dot products it computes.
Learn After
In a self-attention mechanism where a query at a given position
ican only interact with keys at positionsjwherej <= i, how many total query-key dot product computations are performed for an input sequence of length 5 (indexed 0 to 4)?Causal Self-Attention Dot Product Enumeration
For a sequence of 10 tokens (indexed 0 to 9), which of the following query-key dot product computations would be invalid in a self-attention mechanism where a query at a given position
ican only interact with keys at positionsjwherej <= i?