1Cademy - Datastore Composition in k-NN Language Models

Learn Before

Extending k-NN Datastore Context with a Training Dataset

Concept

Datastore Composition in k-NN Language Models

The datastore for a k-NN Language Model is a collection of key-value tuples. Each tuple links a context representation (key) to its corresponding ground-truth next token (value). A set of such tuples is represented as $\{(\mathbf{Z}_1, w_1), ..., (\mathbf{Z}_k, w_k)\}$ . In this structure, each key $\mathbf{Z}_i$ is the final hidden state vector from the LLM's Transformer at a specific position $i$ , and the value $w_i$ is the actual token that follows in the sequence. The datastore is populated by processing a large training corpus and collecting these $(\mathbf{Z}_i, w_i)$ pairs for every token position.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

References

Learn Before

Related

Learn After