Concept

Combined KV Cache for k-NN and Local Memory

One straightforward method for integrating retrieved kk-NN memory is to concatenate it with the local memory. In this approach, the local memory (Mem\mathrm{Mem}) and the kk-NN memory (Memknn\mathrm{Mem}_{k\mathrm{nn}}) are combined to form a single, larger Key-Value cache, represented as [Mem,Memknn][\mathrm{Mem}, \mathrm{Mem}_{k\mathrm{nn}}]. The model then performs a standard query-key-value attention operation on this unified cache for a given query qi\mathbf{q}_i.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences