Learn Before
A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector?
Value Vectors:
- v_0 = [2, 1]
- v_1 = [4, 5]
- v_2 = [6, 0]
- v_3 = [1, 3]
Attention Weights for the included set G:
- α'_0 = 0.5
- α'_2 = 0.2
- α'_3 = 0.3
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Comparison of Sparse and Dense Attention Weights
A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector?
Value Vectors:
- v_0 = [2, 1]
- v_1 = [4, 5]
- v_2 = [6, 0]
- v_3 = [1, 3]
Attention Weights for the included set G:
- α'_0 = 0.5
- α'_2 = 0.2
- α'_3 = 0.3
Analysis of Sparse Attention Formula Components
Analyzing the Impact of the Sparse Index Set