Learn Before
A sequential model updates two history-representing variables, μ and ν, at each step i using the following rules:
μ_i = μ_{i-1} + k'i^T * v_i ν_i = ν{i-1} + k'_i^T
Consider the update at a single step i. If the input value vector v_i is a zero vector (a vector of all zeros), but the input key vector k'_i is a non-zero vector, what is the outcome of the update from step i-1 to step i?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Computational and Memory Efficiency of Linear Attention's Recurrent Method
A sequential model updates two history-representing variables, μ and ν, at each step
iusing the following rules:μ_i = μ_{i-1} + k'i^T * v_i ν_i = ν{i-1} + k'_i^T
Consider the update at a single step
i. If the input value vectorv_iis a zero vector (a vector of all zeros), but the input key vectork'_iis a non-zero vector, what is the outcome of the update from stepi-1to stepi?Recurrent State Update Calculation
Unrolling a Recurrent State Update
Linear Attention Output Calculation