Learn Before
In a specific type of attention mechanism, the history of key-value pairs up to a position i is summarized by two state variables: a matrix μ_i and a vector ν_i. They are defined as cumulative sums:
μ_i = Σ_{j=0 to i} (k'_jᵀ * v_j) (sum of outer products)
ν_i = Σ_{j=0 to i} (k'_jᵀ) (sum of transformed key vectors)
Suppose you have already computed the state variables μ_i and ν_i for a sequence up to position i. To compute the next state variables, μ_{i+1} and ν_{i+1}, what is the only additional information you need?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a simplified attention mechanism, the history of key-value pairs up to a position
iis summarized by two state variables:μ_i, which is the cumulative sum of outer products between transformed key vectors and their corresponding value vectors (Σ k'_jᵀ v_j), andν_i, which is the cumulative sum of the transformed key vectors (Σ k'_jᵀ).Given the following sequence of 2-dimensional vectors up to position
i=2:k'_0 = [1, 0], v_0 = [3, 4] k'_1 = [0, 2], v_1 = [5, 6] k'_2 = [1, 1], v_2 = [7, 8]
Calculate the state variables
μ_2andν_2.In a specific type of attention mechanism, the history of key-value pairs up to a position
iis summarized by two state variables: a matrixμ_iand a vectorν_i. They are defined as cumulative sums:μ_i = Σ_{j=0 to i} (k'_jᵀ * v_j)(sum of outer products)ν_i = Σ_{j=0 to i} (k'_jᵀ)(sum of transformed key vectors)Suppose you have already computed the state variables
μ_iandν_ifor a sequence up to positioni. To compute the next state variables,μ_{i+1}andν_{i+1}, what is the only additional information you need?Computational Advantage of State Variables