1Cademy - Index Set of Non-Zero Attention Weights ($$G$$)

Learn Before

Sparse Attention Weights Assumption

Definition

Index Set of Non-Zero Attention Weights ( $G$ )

In sparse attention, the set $G$ denotes the specific subset of indices for which the attention weights are non-zero and will be computed. For a given token at position $i$ in a causal model, this set is a subset of all preceding positions, formally expressed as $G \subseteq \{0, \dots, i\}$ . This set effectively defines the sparsity pattern by identifying which key-value pairs the current query will attend to.