Concept

Integrating k-NN Memory with Local Memory in Attention

To enhance the attention mechanism for a given query qi\mathbf{q}_i, language models aim to utilize both the immediate local memory, such as the standard Key-Value (KV) cache of recent tokens denoted as Mem\mathrm{Mem}, and the long-term memory retrieved via kk-nearest neighbors, denoted as Memknn\mathrm{Mem}_{k\mathrm{nn}}. Strategies to integrate these two sources of information include combining them to form a single, unified KV cache, [Mem,Memknn][\mathrm{Mem}, \mathrm{Mem}_{k\mathrm{nn}}], and applying standard QKV attention, or using Mem\mathrm{Mem} and Memknn\mathrm{Mem}_{k\mathrm{nn}} in separate, distinct attention steps.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences