Learn Before
Analysis of Sparse Attention Formula Components
A language model computes an output vector for a specific token by taking a weighted sum of value vectors from a predefined subset of previous token positions. The formula for this is: Output = Σ_{j ∈ G} α'_{i,j} v_j, where G is the set of included indices. If a new token position, k, is added to the set G, which term in the formula must be recomputed for all j in the newly expanded set, and why is this re-computation necessary for the formula to remain valid?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Comparison of Sparse and Dense Attention Weights
A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector?
Value Vectors:
- v_0 = [2, 1]
- v_1 = [4, 5]
- v_2 = [6, 0]
- v_3 = [1, 3]
Attention Weights for the included set G:
- α'_0 = 0.5
- α'_2 = 0.2
- α'_3 = 0.3
Analysis of Sparse Attention Formula Components
Analyzing the Impact of the Sparse Index Set