1Cademy - Equivalence Between k-NN and Sparse Attention Models

Learn Before

k-NN as a Popular Retrieval-Based External Memory Method

Relation

Equivalence Between k-NN and Sparse Attention Models

For standard language modeling tasks, the context typically consists of all previously seen tokens in a sequence. Consequently, the key-value pairs of all these preceding tokens are retained and added to the datastore. When a $k$ -NN-based attention model operates with such a datastore containing the sequence history, it becomes essentially equivalent to a sparse attention model. This demonstrates a functional overlap between utilizing an external retrieval datastore for past tokens and applying a sparse attention mechanism.

Updated 2026-04-23

Contributors are: