1Cademy - Analysis of Sparse Attention Formula Components

Learn Before

Sparse Attention Output Formula

Short Answer

Analysis of Sparse Attention Formula Components

A language model computes an output vector for a specific token by taking a weighted sum of value vectors from a predefined subset of previous token positions. The formula for this is: Output = Σ_{j ∈ G} α'_{i,j} v_j, where G is the set of included indices. If a new token position, k, is added to the set G, which term in the formula must be recomputed for all j in the newly expanded set, and why is this re-computation necessary for the formula to remain valid?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related