1Cademy - An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?

Learn Before

Comparison of Memory Storage in Window-based and Moving Average Caches

Multiple Choice

An attention mechanism needs to process a long sequence of information and considers the 16 most recent key-value pairs to inform its output. One design stores all 16 pairs directly in a cache. An alternative design compresses these same 16 pairs into a single, averaged key-value pair. Assuming each key and each value is a single vector, what is the ratio of memory size required by the first design (storing all pairs) compared to the second design (storing the compressed pair)?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related