Learn Before
In a simplified attention mechanism, the history of key-value pairs up to a position i is summarized by two state variables: μ_i, which is the cumulative sum of outer products between transformed key vectors and their corresponding value vectors (Σ k'_jᵀ v_j), and ν_i, which is the cumulative sum of the transformed key vectors (Σ k'_jᵀ).
Given the following sequence of 2-dimensional vectors up to position i=2:
k'_0 = [1, 0], v_0 = [3, 4] k'_1 = [0, 2], v_1 = [5, 6] k'_2 = [1, 1], v_2 = [7, 8]
Calculate the state variables μ_2 and ν_2.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a simplified attention mechanism, the history of key-value pairs up to a position
iis summarized by two state variables:μ_i, which is the cumulative sum of outer products between transformed key vectors and their corresponding value vectors (Σ k'_jᵀ v_j), andν_i, which is the cumulative sum of the transformed key vectors (Σ k'_jᵀ).Given the following sequence of 2-dimensional vectors up to position
i=2:k'_0 = [1, 0], v_0 = [3, 4] k'_1 = [0, 2], v_1 = [5, 6] k'_2 = [1, 1], v_2 = [7, 8]
Calculate the state variables
μ_2andν_2.In a specific type of attention mechanism, the history of key-value pairs up to a position
iis summarized by two state variables: a matrixμ_iand a vectorν_i. They are defined as cumulative sums:μ_i = Σ_{j=0 to i} (k'_jᵀ * v_j)(sum of outer products)ν_i = Σ_{j=0 to i} (k'_jᵀ)(sum of transformed key vectors)Suppose you have already computed the state variables
μ_iandν_ifor a sequence up to positioni. To compute the next state variables,μ_{i+1}andν_{i+1}, what is the only additional information you need?Computational Advantage of State Variables