1Cademy - Selective Computation in Optimized Attention

Learn Before

Sparse Attention Weights Assumption

Short Answer

Selective Computation in Optimized Attention

An optimized attention mechanism is calculating the output for a token at position i=5. The set of indices designated to contribute to this calculation is G = {1, 2, 4}. Explain why the value vector for the token at position j=3, denoted as v_3, is not included in the final weighted sum.

Updated 2025-10-10

Contributors are: