Formula for Memory as a Cumulative Average of Keys and Values
The cumulative average of the keys and values for a memory component () is formulated as a tuple containing the cumulative average of key vectors () and value vectors () up to a given index . This is achieved by summing all vectors from the start of the sequence (index ) to the current index , and then normalizing by the total number of elements, . The mathematical representation is:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Neural Network as a Memory Component
Segment-Level Recurrence for Memory Models
A memory-based attention mechanism updates its fixed-size memory state,
Mem, at each time stepiusing a general recurrent formula:Mem_new = f((k_i, v_i), Mem_old), where(k_i, v_i)is the current key-value pair andMem_oldis the memory state from the previous step. Which of the following update procedures does NOT conform to this recurrent structure?Calculating a Recurrent Memory State
Consider a memory update process defined by the recurrent function
Mem_new = f((k_i, v_i), Mem_old), where(k_i, v_i)is the input at the current step andMem_oldis the memory state from the previous step. To compute the memory state for step 100, this process requires direct access to the individual key-value pairs from all 99 preceding steps (i.e., from step 1 to 99).Formula for Memory as a Cumulative Average of Keys and Values
Recursive Formula for Cumulative Average
A language model's memory component is designed to create a summary vector at each step by calculating the average of all key-value pairs from the start of the sequence up to that current step. When this model is processing a very long sequence, what is the effect on the summary vector's representation of information from the very beginning of the sequence as the model approaches the end?
Analysis of Memory Summary Techniques
Selecting a Memory Summarization Strategy
Formula for Memory as a Cumulative Average of Keys and Values
Learn After
Diagnosing Semantic Repetition
A memory mechanism calculates its final state by taking the cumulative average of all key vectors and all value vectors in a sequence of 100 tokens. How does the influence of the key vector from the first token (at index 0) on the final memory state compare to the influence of the key vector from the last token (at index 99)?
Calculating a Memory State
Recursive Formula for Memory as a Cumulative Average