Analysis of Memory Summary Techniques
A language model's memory component can be summarized using two different averaging techniques. Technique 1 calculates the summary by averaging all key-value pairs from the start of the sequence up to the current position. Technique 2 calculates the summary by averaging only the key-value pairs within a fixed-size window of the most recent positions. Compare these two techniques, explaining the primary advantage and disadvantage of Technique 1 (the cumulative approach) regarding its representation of the sequence's history.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Recursive Formula for Cumulative Average
A language model's memory component is designed to create a summary vector at each step by calculating the average of all key-value pairs from the start of the sequence up to that current step. When this model is processing a very long sequence, what is the effect on the summary vector's representation of information from the very beginning of the sequence as the model approaches the end?
Analysis of Memory Summary Techniques
Selecting a Memory Summarization Strategy
Formula for Memory as a Cumulative Average of Keys and Values