Softmax-based k-NN Probability Distribution
In a -nearest neighbors (-NN) language model, a retrieval-based probability distribution is defined over the vocabulary . Given a hidden state representation , this distribution is computed by applying the Softmax function to a vector of negative distances: . Here, represents the distance between and the retrieved key if the corresponding reference token matches the -th entry of the vocabulary ; otherwise, is .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Softmax-based k-NN Probability Distribution
Calculating Aggregated Distances from Nearest Neighbors
A k-NN Language Model retrieves the 4 nearest neighbors from its datastore for a given query hidden state. The retrieved neighbors, their corresponding token values, and their distances to the query are listed below:
- Neighbor 1: Value = 'cat', Distance = 0.2
- Neighbor 2: Value = 'dog', Distance = 0.3
- Neighbor 3: Value = 'cat', Distance = 0.5
- Neighbor 4: Value = 'fish', Distance = 0.6
Based on this information, what is the aggregated distance, , for the vocabulary token 'cat'?
Analyzing Prediction Outcomes via Neighbor Distances
Learn After
A language model computes a probability distribution over its vocabulary by transforming a vector of aggregated distances. The transformation is applied to the negative of these distances, such that smaller distances result in higher probabilities. Given the following aggregated distances for four vocabulary tokens, which token will be assigned the highest probability?
- Token 'alpha': 5.2
- Token 'beta': 1.8
- Token 'gamma': 9.4
- Token 'delta': 2.1
True or False: In the process of creating a probability distribution from k-nearest neighbors, applying the Softmax function directly to the vector of positive aggregated distances (e.g.,
[d_0, ..., d_|V|]) would result in tokens with larger distances being assigned higher probabilities.Debugging a k-NN Probability Calculation