1Cademy - Sparsity Level and the Size of Index Set $$G$$

Configuration A: Each new token computes attention scores with only the 16 most recent tokens in the sequence.
Configuration B: Each new token computes attention scores with all preceding tokens up to a maximum of 512.

Learn Before

Index Set of Non-Zero Attention Weights ( $G$ )

Relation

Sparsity Level and the Size of Index Set $G$

The degree of sparsity in a sparse attention model is directly determined by the size of the index set $G$ . A smaller set $G$ implies that fewer attention weights are computed, resulting in a higher degree of sparsity and greater computational efficiency. Conversely, a larger set $G$ leads to a denser model.