Learn Before
Computational and Memory Efficiency of Linear Attention's Recurrent Method
A primary benefit of the recurrent model utilizing and is that it eliminates the need to retain all past queries and values. By relying exclusively on the latest representations, and , the computational cost of each individual step remains constant. Consequently, this allows the model to be easily extended to handle very long sequences.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Computational and Memory Efficiency of Linear Attention's Recurrent Method
A sequential model updates two history-representing variables, μ and ν, at each step
iusing the following rules:μ_i = μ_{i-1} + k'i^T * v_i ν_i = ν{i-1} + k'_i^T
Consider the update at a single step
i. If the input value vectorv_iis a zero vector (a vector of all zeros), but the input key vectork'_iis a non-zero vector, what is the outcome of the update from stepi-1to stepi?Recurrent State Update Calculation
Unrolling a Recurrent State Update
Linear Attention Output Calculation
Learn After
A model is designed to process a continuous, unending stream of input data. At each time step
i, it calculates an output that depends on all inputs from step 1 toi. The model's internal state is updated using a method where the state at stepiis a function of only the state at stepi-1and the current input at stepi. What is the primary implication of this update method for the model's memory requirements as the stream continues indefinitely?Architectural Trade-offs for Long-Sequence Processing
Analysis of Memory Scaling in Sequence Processing