An autoregressive model is processing a sequence of 5 tokens, indexed 0 through 4. The model's attention mechanism is constrained so that any given token can only attend to itself and to tokens that appeared earlier in the sequence. Which of the following diagrams correctly visualizes the set of all required dot product calculations between query vectors (q, representing each token's perspective) and key vectors (k, representing each token's content)? An 'X' marks a calculation that is performed.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive model is processing a sequence of 5 tokens, indexed 0 through 4. The model's attention mechanism is constrained so that any given token can only attend to itself and to tokens that appeared earlier in the sequence. Which of the following diagrams correctly visualizes the set of all required dot product calculations between query vectors (q, representing each token's perspective) and key vectors (k, representing each token's content)? An 'X' marks a calculation that is performed.
Total Attention Score Calculations
An autoregressive model is processing a sequence of 6 tokens, indexed 0 through 5. The model uses an attention mechanism where a query from a specific token position can only interact with keys from the same or preceding positions. Match each query vector to the complete set of key vectors it will be multiplied with to calculate attention scores.