Learn Before
Analyzing the Impact of the Sparse Index Set
A language model is computing the output for a token at position i=4. The full set of available previous value vectors is {v_0, v_1, v_2, v_3}. Based on the provided sparse attention formula, identify which of these value vectors will be completely ignored in the calculation and explain your reasoning by referencing the components of the formula.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Comparison of Sparse and Dense Attention Weights
A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector?
Value Vectors:
- v_0 = [2, 1]
- v_1 = [4, 5]
- v_2 = [6, 0]
- v_3 = [1, 3]
Attention Weights for the included set G:
- α'_0 = 0.5
- α'_2 = 0.2
- α'_3 = 0.3
Analysis of Sparse Attention Formula Components
Analyzing the Impact of the Sparse Index Set