An autoregressive language model is in the process of generating a sequence of tokens. When a single attention head calculates its output for the 4th token in the sequence, which set of key and value vectors does it use to ensure it only relies on previously generated information?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model is in the process of generating a sequence of tokens. When a single attention head calculates its output for the 4th token in the sequence, which set of key and value vectors does it use to ensure it only relies on previously generated information?
True or False: In a causal attention mechanism, when a single attention head is calculating the output for the 4th token in a sequence, the query vector for that 4th token (q_4) will interact with the key vector from the 6th token (k_6) to compute an attention score.
Causal Attention Inputs