1Cademy - Effect of Sparsity on Attention Weights

Learn Before

Comparison of Sparse and Dense Attention Weights

Short Answer

Effect of Sparsity on Attention Weights

An attention mechanism calculates the following dense attention weights for a specific query token against four preceding key tokens:

Token 1: 0.10
Token 2: 0.40
Token 3: 0.20
Token 4: 0.30

The mechanism is then modified to be sparse, considering only Token 2 and Token 4. The attention scores are re-calculated and re-normalized over just this smaller set. Explain the fundamental reason why the new, sparse attention weight for Token 2 will be greater than its original dense weight of 0.40.

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related