Learn Before
Evaluating Memory Models in Attention Mechanisms
An engineering team is designing a language model and is considering two approaches for the memory component (Mem) in the attention operation Att(q_i, Mem).
- Approach 1: The memory component
Memconsists of the complete, unaltered set of all key and value vectors generated up to the current positioni. - Approach 2: The memory component
Memis a compressed, fixed-size summary of all key and value vectors generated up to the current positioni.
Evaluate the primary trade-off between these two approaches, considering both computational resource usage during text generation and the potential impact on the model's ability to handle long-range dependencies in the text. Justify your evaluation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model generates text token by token. At each step 'i', an attention operation computes an output using a query vector and a memory component. In a standard causal implementation, this memory component is defined as the complete set of key and value vectors from all previous steps (1 to i). Based on this definition, what is the direct relationship between the size of this memory component and the length of the generated sequence 'i'?
Sparse Attention with a Fixed Key-Value Subset
Evaluating Memory Models in Attention Mechanisms
Evaluating an Attention Mechanism for a Real-Time Application