Formula

Sparse Attention with a Fixed Key-Value Subset

This form of attention mechanism restricts the query vector at a given position i, denoted as q_i, to interact with a predefined, sparse subset of key-value pairs. Instead of attending to the entire history of keys and values (K_≤i, V_≤i), the attention is calculated only over a specific set, such as {k_1, k_i} for keys and {v_1, v_i} for values. The formula is expressed as: Att_qkv(q_i, {k_1, k_i}, {v_1, v_i}).

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences