1Cademy - Sparse Attention with a Fixed Key-Value Subset

Learn Before

General Form of Memory-Based Attention

Formula

Sparse Attention with a Fixed Key-Value Subset

This form of attention mechanism restricts the query vector at a given position i, denoted as q_i, to interact with a predefined, sparse subset of key-value pairs. Instead of attending to the entire history of keys and values (K_≤i, V_≤i), the attention is calculated only over a specific set, such as {k_1, k_i} for keys and {v_1, v_i} for values. The formula is expressed as: Att_qkv(q_i, {k_1, k_i}, {v_1, v_i}).