Recursive Formula for Memory as a Cumulative Average
The cumulative average for the memory component can be calculated efficiently using a recursive formula. The memory state at the current step, , is computed based on the current key-value pair and the memory state from the previous step, . The recursive formula is expressed as: This approach prevents the need to recalculate the sum of all historical key-value pairs at each step, significantly increasing computational efficiency.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Recursive Formula for Memory as a Cumulative Average
A real-time analytics system tracks the average score for a popular online game. After 1,000,000 games have been played, the system has calculated a cumulative average score. When the 1,000,001st game score is recorded, which of the following methods for updating the average is the most computationally efficient and mathematically correct?
Real-time Performance Monitoring
Calculating an Updated Cumulative Average
Recurrent Memory Models as a Basis for Self-Attention Alternatives
Recursive Formula for Memory as a Cumulative Average
A recurrent model with an internal state
his processing a sequence of inputs. The state is updated at each step according to the ruleh_i = f(h_{i-1}, input_i), whereh_{i-1}is the state from the previous step andinput_iis the current input. When the model processes the third input in a sequence, what information does the termh_2(the state after the second input) represent in the computation for the new stateh_3?Analysis of Sequential Information Processing
A neural network processes a sequence of inputs by updating a hidden state
hat each stepiusing the formula:h_i = f(h_{i-1}, input_i). Which component in this formula is directly responsible for carrying forward a compressed summary of the entire sequence processed up to the previous step (i-1)?Recurrent Computation of and in Linear Attention
Real-Time Applications of Recurrent Models
Resurgence of Recurrent Models in Large Language Models
Sequential Token Processing in Recurrent Models
Comparison of Efficient LLM Architectures
Diagnosing Semantic Repetition
A memory mechanism calculates its final state by taking the cumulative average of all key vectors and all value vectors in a sequence of 100 tokens. How does the influence of the key vector from the first token (at index 0) on the final memory state compare to the influence of the key vector from the last token (at index 99)?
Calculating a Memory State
Recursive Formula for Memory as a Cumulative Average
Learn After
Memory Efficiency of Recursive Cumulative Average
A system processes a very long sequence of data pairs, one at a time. At each step, it must update a 'memory state' to be the precise mathematical average of all data pairs seen up to that point. Consider two methods for updating the memory state at the 10,000th step:
Method A: Re-access the entire history of 10,000 data pairs, sum them up, and divide by 10,000.
Method B: Use only the memory state from the 9,999th step (which was the average of the first 9,999 pairs) and the new 10,000th data pair to calculate the new average.
Which statement best analyzes the primary advantage of Method B over Method A in this context?
Calculating a Recursive Memory State
Step-by-Step Memory State Calculation
Inference Efficiency of Cumulative Average Memory