1Cademy - Debugging a k-NN Probability Calculation

Learn Before

Softmax-based k-NN Probability Distribution

Case Study

Debugging a k-NN Probability Calculation

A machine learning engineer is building a system that generates a probability distribution over a vocabulary based on the proximity of a query vector to its nearest neighbors. After implementation, they notice a critical bug: words that are farthest from the query (i.e., have the largest aggregated distances) are consistently being assigned the highest probabilities. The engineer's code for the final probability calculation step is probabilities = Softmax([d_token1, d_token2, ..., d_tokenN]), where d_token is the aggregated distance for each token. Based on the described method for converting distances to probabilities, what is the single, fundamental error in the engineer's implementation, and what specific change is needed to fix it?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related