Calculating Aggregated Distances from Nearest Neighbors
A language model is predicting the next word. It has a vocabulary of { 'cat', 'dog', 'fox', 'hen' }. After processing a context, it retrieves the 5 nearest neighbors from its datastore, listed below with their distances and associated word tokens:
- Neighbor 1: (distance=0.2, token='dog')
- Neighbor 2: (distance=0.4, token='fox')
- Neighbor 3: (distance=0.5, token='cat')
- Neighbor 4: (distance=0.7, token='dog')
- Neighbor 5: (distance=0.9, token='fox')
Based on a method where the aggregated distance for a vocabulary token is the distance to its closest matching neighbor in the retrieved set, what are the aggregated distances for each token in the vocabulary? (Note: If a token does not appear in the neighbors, its distance is effectively infinite).
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Softmax-based k-NN Probability Distribution
Calculating Aggregated Distances from Nearest Neighbors
A k-NN Language Model retrieves the 4 nearest neighbors from its datastore for a given query hidden state. The retrieved neighbors, their corresponding token values, and their distances to the query are listed below:
- Neighbor 1: Value = 'cat', Distance = 0.2
- Neighbor 2: Value = 'dog', Distance = 0.3
- Neighbor 3: Value = 'cat', Distance = 0.5
- Neighbor 4: Value = 'fish', Distance = 0.6
Based on this information, what is the aggregated distance, , for the vocabulary token 'cat'?
Analyzing Prediction Outcomes via Neighbor Distances