1Cademy - Evaluating Vector Contributions in an Optimized Attention Mechanism

Learn Before

Sparse Attention Weights Assumption

Case Study

Evaluating Vector Contributions in an Optimized Attention Mechanism

Based on the provided scenario, what is the effective weight applied to the value vector v_5 when calculating the final output for position 8, and why?

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Sparse Attention Output Formula
A causal model is calculating the output for the token at position i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices is G = {0, 2}. The attention weights for these indices are α_3,0 = 0.6 and α_3,2 = 0.4. The value vectors for the relevant positions are: v_0 = [1, 0], v_1 = [2, 2], and v_2 = [0, 3]. Based on this information, what is the final output vector for position 3?
Evaluating Vector Contributions in an Optimized Attention Mechanism
Selective Computation in Optimized Attention
Index Set of Non-Zero Attention Weights ( $G$ )

Learn Before

Related