Memory Efficiency of Recursive Cumulative Average
A significant advantage of computing the cumulative average for the memory component with a recursive formula is its memory efficiency. During inference, this method only requires storing the single key-value pair representing the previous state's average, rather than retaining the entire history of all key-value pairs.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Memory Efficiency of Recursive Cumulative Average
A system processes a very long sequence of data pairs, one at a time. At each step, it must update a 'memory state' to be the precise mathematical average of all data pairs seen up to that point. Consider two methods for updating the memory state at the 10,000th step:
Method A: Re-access the entire history of 10,000 data pairs, sum them up, and divide by 10,000.
Method B: Use only the memory state from the 9,999th step (which was the average of the first 9,999 pairs) and the new 10,000th data pair to calculate the new average.
Which statement best analyzes the primary advantage of Method B over Method A in this context?
Calculating a Recursive Memory State
Step-by-Step Memory State Calculation
Inference Efficiency of Cumulative Average Memory
Learn After
Analysis of Memory Efficiency in Running Average Algorithms
A language model processes a very long document (10,000 tokens) and maintains a memory state by computing the cumulative average of all key-value pairs from the beginning of the sequence. If this average is updated at each step using a recursive formula, what information from the past must be stored in memory to compute the state for the 10,000th token?
Memory Usage Comparison: Recursive vs. Naive Cumulative Average