1Cademy - Characterizing a Retrieval-Based Probability Distribution

Learn Before

Using Reference Tokens to Define a Vocabulary Distribution in k-NN LM

Short Answer

Characterizing a Retrieval-Based Probability Distribution

A language model uses a retrieval mechanism to improve its predictions. To predict the next word for the prefix 'The cat chased the mouse under the...', the model finds the 5 most similar contexts from a large text collection. The words that followed these 5 contexts were: sofa, chair, sofa, sofa, table. Based only on this set of 5 retrieved words, describe the key characteristics of the new probability distribution that would be formed over the vocabulary. Specifically, which words would have high, low, and zero probability?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Learn Before

Related