Learn Before
Unrolling a Recurrent State Update
A system maintains two state variables, μ and ν, which are updated at each step i according to the following rules:
μ_i = μ_{i-1} + k'iᵀ v_i ν_i = ν{i-1} + k'_iᵀ
Assuming the initial states μ₀ and ν₀ are both zero vectors, write the expanded mathematical expression for μ₃ solely in terms of the input vectors k' and v from steps 1, 2, and 3.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Computational and Memory Efficiency of Linear Attention's Recurrent Method
A sequential model updates two history-representing variables, μ and ν, at each step
iusing the following rules:μ_i = μ_{i-1} + k'i^T * v_i ν_i = ν{i-1} + k'_i^T
Consider the update at a single step
i. If the input value vectorv_iis a zero vector (a vector of all zeros), but the input key vectork'_iis a non-zero vector, what is the outcome of the update from stepi-1to stepi?Recurrent State Update Calculation
Unrolling a Recurrent State Update
Linear Attention Output Calculation