Relation

Equivalence Between k-NN and Sparse Attention Models

For standard language modeling tasks, the context typically consists of all previously seen tokens in a sequence. Consequently, the key-value pairs of all these preceding tokens are retained and added to the datastore. When a kk-NN-based attention model operates with such a datastore containing the sequence history, it becomes essentially equivalent to a sparse attention model. This demonstrates a functional overlap between utilizing an external retrieval datastore for past tokens and applying a sparse attention mechanism.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences