Learn Before
Evaluating Vector Contributions in an Optimized Attention Mechanism
Based on the provided scenario, what is the effective weight applied to the value vector v_5 when calculating the final output for position 8, and why?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sparse Attention Output Formula
A causal model is calculating the output for the token at position
i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices isG = {0, 2}. The attention weights for these indices areα_3,0 = 0.6andα_3,2 = 0.4. The value vectors for the relevant positions are:v_0 = [1, 0],v_1 = [2, 2], andv_2 = [0, 3]. Based on this information, what is the final output vector for position 3?Evaluating Vector Contributions in an Optimized Attention Mechanism
Selective Computation in Optimized Attention
Index Set of Non-Zero Attention Weights ()