Learn Before
Debugging a k-NN Probability Calculation
A machine learning engineer is building a system that generates a probability distribution over a vocabulary based on the proximity of a query vector to its nearest neighbors. After implementation, they notice a critical bug: words that are farthest from the query (i.e., have the largest aggregated distances) are consistently being assigned the highest probabilities. The engineer's code for the final probability calculation step is probabilities = Softmax([d_token1, d_token2, ..., d_tokenN]), where d_token is the aggregated distance for each token. Based on the described method for converting distances to probabilities, what is the single, fundamental error in the engineer's implementation, and what specific change is needed to fix it?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model computes a probability distribution over its vocabulary by transforming a vector of aggregated distances. The transformation is applied to the negative of these distances, such that smaller distances result in higher probabilities. Given the following aggregated distances for four vocabulary tokens, which token will be assigned the highest probability?
- Token 'alpha': 5.2
- Token 'beta': 1.8
- Token 'gamma': 9.4
- Token 'delta': 2.1
True or False: In the process of creating a probability distribution from k-nearest neighbors, applying the Softmax function directly to the vector of positive aggregated distances (e.g.,
[d_0, ..., d_|V|]) would result in tokens with larger distances being assigned higher probabilities.Debugging a k-NN Probability Calculation