Learn Before
Selective Computation in Optimized Attention
An optimized attention mechanism is calculating the output for a token at position i=5. The set of indices designated to contribute to this calculation is G = {1, 2, 4}. Explain why the value vector for the token at position j=3, denoted as v_3, is not included in the final weighted sum.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sparse Attention Output Formula
A causal model is calculating the output for the token at position
i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices isG = {0, 2}. The attention weights for these indices areα_3,0 = 0.6andα_3,2 = 0.4. The value vectors for the relevant positions are:v_0 = [1, 0],v_1 = [2, 2], andv_2 = [0, 3]. Based on this information, what is the final output vector for position 3?Evaluating Vector Contributions in an Optimized Attention Mechanism
Selective Computation in Optimized Attention
Index Set of Non-Zero Attention Weights ()