Learn Before
Interpreting Causal Attention Output
Based on the provided value vectors and attention weights, which token's meaning will have the most influence on the final attention output vector for the token at position 2? Justify your answer by explaining how the output is calculated.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In an autoregressive model, the attention output for a token is a weighted sum of the value vectors of itself and all preceding tokens. Consider a sequence of three tokens (at positions 0, 1, and 2). The value vectors are given as v_0 = [1, 2], v_1 = [3, 0], and v_2 = [4, 5]. The attention weights for the token at position 2, which determine the contribution of each token in the context, are α_2,0 = 0.1, α_2,1 = 0.6, and α_2,2 = 0.3. Based on this information, what is the attention output vector for the token at position 2?
Interpreting Causal Attention Output
Debugging a Causal Attention Calculation
Dense Attention Assumption