Learn Before
Next Token Selection in k-NN Language Models
After computing the interpolated final probability distribution, , a -nearest neighbors (-NN) language model selects the next token, . This selection is achieved by finding the specific token that maximizes the final probability, , thereby choosing the most probable output based on the combined retrieval and model predictions.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Next Token Selection in k-NN Language Models
A language model's final probability for a word is determined by blending its own internal prediction with a prediction based on retrieved text examples. The formula used is:
Final_Prob = λ * Retrieved_Prob + (1 - λ) * Internal_Prob. In a scenario where the model's internal prediction for the next word is 'innovative', but the most frequent word in similar retrieved examples is 'creative', how would the value of the coefficientλinfluence the outcome?Analyzing Component Influence in a k-NN Language Model
In a language model that combines its own predictions with information from retrieved examples using the formula
Final_Prob = λ * Retrieved_Prob + (1 - λ) * Base_LM_Prob, setting the coefficientλto 0 results in the final prediction being determined entirely by the retrieved examples.
Learn After
A language model combines a base probability distribution, P_lm, with a retrieval-based distribution, P_knn, to predict the next token. The final probability is calculated by blending these two distributions using an interpolation coefficient, λ = 0.6. Given the distributions below for a small vocabulary, which token will the model select as its final output?
- P_lm: {'wordA': 0.5, 'wordB': 0.4, 'wordC': 0.1}
- P_knn: {'wordA': 0.2, 'wordB': 0.7, 'wordC': 0.1}
Analyzing Conflicting Model Predictions
Calculating the Interpolation Coefficient