1Cademy - Retrieving Reference Tokens in k-NN LM Inference

Learn Before

Inference Architecture of k-NN Language Models

Activity (Process)

Retrieving Reference Tokens in k-NN LM Inference

During inference in a $k$ -nearest neighbors ( $k$ -NN) language model, the process begins with the model's hidden state representation for a given prefix, denoted as $\mathbf{h}_i$ . This representation is used to search the datastore for the $k$ closest matching data items, which take the form of key-value tuples: $\lbrace (\mathbf{z}_1,w_1),\dots,(\mathbf{z}_k,w_k) \rbrace$ . The retrieved values $\lbrace w_1,\dots,w_k \rbrace$ serve as reference tokens, guiding the model's prediction of the subsequent token based on the prefix representation $\mathbf{h}_i$ .