Learn Before
An autoregressive model is generating a sequence of tokens one by one. It is currently calculating the attention output for the token at position 4 (i.e., the fifth token in the sequence). To ensure the model only uses information it has already seen, which set of key (K) and value (V) vectors must be used as input to the attention mechanism for the query vector at position 4 (q₄)?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Computational Cost per Token in Causal Attention
Reusability of Key-Value Pairs in Autoregressive Inference
Example of Query-Key Interactions in Causal Attention
An autoregressive model is generating a sequence of tokens one by one. It is currently calculating the attention output for the token at position 4 (i.e., the fifth token in the sequence). To ensure the model only uses information it has already seen, which set of key (K) and value (V) vectors must be used as input to the attention mechanism for the query vector at position 4 (q₄)?
Diagnosing Information Leakage in an Autoregressive Model
When calculating the attention output for a specific token at position
iin an autoregressive model, the mechanism is structured to use the query vector from that same position (q_i), while the key and value matrices are composed of the corresponding vectors from all positions in the full input sequence.