1Cademy - Combined KV Cache for k-NN and Local Memory

Learn Before

Integrating k-NN Memory with Local Memory in Attention
Using Retrieved Context to Improve Attention

Concept

Combined KV Cache for k-NN and Local Memory

One straightforward method for integrating retrieved $k$ -NN memory is to concatenate it with the local memory. In this approach, the local memory ( $\mathrm{Mem}$ ) and the $k$ -NN memory ( $\mathrm{Mem}_{k\mathrm{nn}}$ ) are combined to form a single, larger Key-Value cache, represented as $[\mathrm{Mem}, \mathrm{Mem}_{k\mathrm{nn}}]$ . The model then performs a standard query-key-value attention operation on this unified cache for a given query $\mathbf{q}_i$ .