In a self-attention mechanism where a query at a given position i can only interact with keys at positions j where j <= i, how many total query-key dot product computations are performed for an input sequence of length 5 (indexed 0 to 4)?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a self-attention mechanism where a query at a given position
ican only interact with keys at positionsjwherej <= i, how many total query-key dot product computations are performed for an input sequence of length 5 (indexed 0 to 4)?Causal Self-Attention Dot Product Enumeration
For a sequence of 10 tokens (indexed 0 to 9), which of the following query-key dot product computations would be invalid in a self-attention mechanism where a query at a given position
ican only interact with keys at positionsjwherej <= i?