Selecting a Memory Summarization Strategy
Consider two different methods for creating a memory summary vector at each step in a sequence:
- Method 1: The summary is the average of all key-value pairs from the beginning of the sequence up to the current position.
- Method 2: The summary is the average of only the last 50 key-value pairs.
For each scenario described below, determine which method (1 or 2) is more suitable and briefly justify your choice by explaining the trade-offs of each method in that context.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Recursive Formula for Cumulative Average
A language model's memory component is designed to create a summary vector at each step by calculating the average of all key-value pairs from the start of the sequence up to that current step. When this model is processing a very long sequence, what is the effect on the summary vector's representation of information from the very beginning of the sequence as the model approaches the end?
Analysis of Memory Summary Techniques
Selecting a Memory Summarization Strategy
Formula for Memory as a Cumulative Average of Keys and Values