Learn Before
A causal model is calculating the output for the token at position i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices is G = {0, 2}. The attention weights for these indices are α_3,0 = 0.6 and α_3,2 = 0.4. The value vectors for the relevant positions are: v_0 = [1, 0], v_1 = [2, 2], and v_2 = [0, 3]. Based on this information, what is the final output vector for position 3?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sparse Attention Output Formula
A causal model is calculating the output for the token at position
i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices isG = {0, 2}. The attention weights for these indices areα_3,0 = 0.6andα_3,2 = 0.4. The value vectors for the relevant positions are:v_0 = [1, 0],v_1 = [2, 2], andv_2 = [0, 3]. Based on this information, what is the final output vector for position 3?Evaluating Vector Contributions in an Optimized Attention Mechanism
Selective Computation in Optimized Attention
Index Set of Non-Zero Attention Weights ()