Learn Before
Index Set of Non-Zero Attention Weights ()
In sparse attention, the set denotes the specific subset of indices for which the attention weights are non-zero and will be computed. For a given token at position in a causal model, this set is a subset of all preceding positions, formally expressed as . This set effectively defines the sparsity pattern by identifying which key-value pairs the current query will attend to.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Sparse Attention Output Formula
A causal model is calculating the output for the token at position
i=3. The model's attention mechanism is optimized to only consider a subset of previous positions. The set of contributing indices isG = {0, 2}. The attention weights for these indices areα_3,0 = 0.6andα_3,2 = 0.4. The value vectors for the relevant positions are:v_0 = [1, 0],v_1 = [2, 2], andv_2 = [0, 3]. Based on this information, what is the final output vector for position 3?Evaluating Vector Contributions in an Optimized Attention Mechanism
Selective Computation in Optimized Attention
Index Set of Non-Zero Attention Weights ()
Learn After
A causal language model uses a sparse attention mechanism. When calculating the output for the token at position
i=10, the set of indices for the key-value pairs to be attended to is specified asG = {2, 5, 9}. Which of the following statements accurately describes the computation for the token at position 10?A causal language model is using a sparse attention mechanism to compute the output for the token at position
i = 8. The setGdefines the indices of the key-value pairs that the current token will attend to. Which of the following options represents an invalid setGfor this computation?Analysis of Sparse Attention Patterns
Sparsity Level and the Size of Index Set