1Cademy - Integrating k-NN Memory with Local Memory in Attention

Learn Before

k-NN as a Popular Retrieval-Based External Memory Method

Concept

Integrating k-NN Memory with Local Memory in Attention

To enhance the attention mechanism for a given query $\mathbf{q}_i$ , language models aim to utilize both the immediate local memory, such as the standard Key-Value (KV) cache of recent tokens denoted as $\mathrm{Mem}$ , and the long-term memory retrieved via $k$ -nearest neighbors, denoted as $\mathrm{Mem}_{k\mathrm{nn}}$ . Strategies to integrate these two sources of information include combining them to form a single, unified KV cache, $[\mathrm{Mem}, \mathrm{Mem}_{k\mathrm{nn}}]$ , and applying standard QKV attention, or using $\mathrm{Mem}$ and $\mathrm{Mem}_{k\mathrm{nn}}$ in separate, distinct attention steps.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

References

Learn Before

Related

Learn After