Learn Before
True or False: In the process of creating a probability distribution from k-nearest neighbors, applying the Softmax function directly to the vector of positive aggregated distances (e.g., [d_0, ..., d_|V|]) would result in tokens with larger distances being assigned higher probabilities.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model computes a probability distribution over its vocabulary by transforming a vector of aggregated distances. The transformation is applied to the negative of these distances, such that smaller distances result in higher probabilities. Given the following aggregated distances for four vocabulary tokens, which token will be assigned the highest probability?
- Token 'alpha': 5.2
- Token 'beta': 1.8
- Token 'gamma': 9.4
- Token 'delta': 2.1
True or False: In the process of creating a probability distribution from k-nearest neighbors, applying the Softmax function directly to the vector of positive aggregated distances (e.g.,
[d_0, ..., d_|V|]) would result in tokens with larger distances being assigned higher probabilities.Debugging a k-NN Probability Calculation