Learn Before
General Recurrent Formula for Memory Update
The update process for a memory component in a memory-based attention mechanism can be described by a general recurrent function. At each time step i, the new memory state, Mem, is computed by a function f. This function takes the current key-value pair, , and the previous memory state, , as its inputs. The formula is expressed as: This general framework can be instantiated with specific models for the update function f, such as a recurrent neural network or a simple moving average.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fixed-Size Window Memory as a Form of Local Attention
Summary Vectors for Memory Compression in Attention
General Recurrent Formula for Memory Update
Comparison of Memory Storage in Window-based and Moving Average Caches
Hybrid Cache for Attention Mechanisms
An attention mechanism is designed to use a memory component that has a constant, fixed size, regardless of how long the input sequence becomes. What is the primary computational consequence of this design choice as the input sequence length increases significantly?
Computational Cost Scaling in Attention Mechanisms
Optimizing a Real-Time Sequence Processing Model
Learn After
Neural Network as a Memory Component
Segment-Level Recurrence for Memory Models
A memory-based attention mechanism updates its fixed-size memory state,
Mem, at each time stepiusing a general recurrent formula:Mem_new = f((k_i, v_i), Mem_old), where(k_i, v_i)is the current key-value pair andMem_oldis the memory state from the previous step. Which of the following update procedures does NOT conform to this recurrent structure?Calculating a Recurrent Memory State
Consider a memory update process defined by the recurrent function
Mem_new = f((k_i, v_i), Mem_old), where(k_i, v_i)is the input at the current step andMem_oldis the memory state from the previous step. To compute the memory state for step 100, this process requires direct access to the individual key-value pairs from all 99 preceding steps (i.e., from step 1 to 99).Formula for Memory as a Cumulative Average of Keys and Values