Inference Efficiency of Cumulative Average Memory
A key advantage of using a recursive formula for the cumulative average memory model is its efficiency during inference. Because the new memory state can be calculated using only the previous state and the current key-value pair, the model does not need to store the entire history of all preceding key-value pairs. This reduces the memory requirement to a single key-value pair, making it highly efficient for long sequences.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Memory Efficiency of Recursive Cumulative Average
A system processes a very long sequence of data pairs, one at a time. At each step, it must update a 'memory state' to be the precise mathematical average of all data pairs seen up to that point. Consider two methods for updating the memory state at the 10,000th step:
Method A: Re-access the entire history of 10,000 data pairs, sum them up, and divide by 10,000.
Method B: Use only the memory state from the 9,999th step (which was the average of the first 9,999 pairs) and the new 10,000th data pair to calculate the new average.
Which statement best analyzes the primary advantage of Method B over Method A in this context?
Calculating a Recursive Memory State
Step-by-Step Memory State Calculation
Inference Efficiency of Cumulative Average Memory
Learn After
Analysis of Memory Efficiency in Sequential Processing
A system is designed to process a continuous stream of data points (e.g., sensor readings) and must maintain an up-to-date average of all points seen so far. Consider two approaches for updating this average after receiving the Nth data point:
Approach 1: Uses a recursive formula that takes the previous average (calculated up to point N-1) and the new Nth data point to compute the new average.
Approach 2: Stores every single data point from 1 to N in a list and recalculates the average of the entire list every time a new point arrives.
As the number of data points (N) grows very large, what is the most significant difference in the memory requirements between these two approaches?
A system uses a recursive formula to update its memory state, where the new state
Mem_iis calculated based on the previous stateMem_{i-1}and the current inputitem_i. For this system to correctly calculate the state at step 1,000,000, it must store all one million individual inputs fromitem_1toitem_1,000,000in its memory.