1Cademy - An engineer is designing a language model that must process very long sequences while keeping the computational cost of attention constant at each step. They are considering two approaches for the models memory component:<br><br>* **Approach 1:** The memory stores the raw key-value pairs from the 256 most recent positions in the sequence.<br>* **Approach 2:** The memory is a pair of fixed-size summary vectors, which are calculated by mathematically combining all preceding key-value pairs into a si

Learn Before

Summary Vectors for Memory Compression in Attention

Multiple Choice

An engineer is designing a language model that must process very long sequences while keeping the computational cost of attention constant at each step. They are considering two approaches for the model's memory component:

Approach 1: The memory stores the raw key-value pairs from the 256 most recent positions in the sequence.
Approach 2: The memory is a pair of fixed-size 'summary' vectors, which are calculated by mathematically combining all preceding key-value pairs into a si

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related