Learn Before
Concept

Sparse Attention Weights Assumption

In contrast to standard self-attention, sparse attention assumes that only some entries within the attention weight vector [αi,0αi,i]\begin{bmatrix} \alpha_{i,0} & \dots & \alpha_{i,i} \end{bmatrix} are non-zero. The remaining entries are simply ignored in the computation. This is formalized by defining a set G{0,,i}G \subseteq \{0, \dots, i\}, which represents the specific indices of these non-zero entries. Consequently, any subsequent output calculations for position ii will only utilize the indices present in the set GG.

Image 0

0

1

Updated 2026-04-22

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related