Learn Before
Analyzing the Impact of the 'K' Parameter on Token Selection
Two language models are generating a continuation for the phrase: 'The best way to start the day is with a cup of...'. Both models use the same operator to select a set of the most probable next tokens before making a final choice. However, Model A is configured to select the top 2 candidates (K=2), while Model B is configured to select the top 5 candidates (K=5). Given the following simplified probability distribution over the vocabulary, identify the candidate set for each model and analyze how the difference in the size of 'K' influences the potential for generating either a predictable or a creative response.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Mathematical Definition of Top-K Token Selection
Formal Derivation of the Top-k Selection Pool
A language model is generating text and needs to decide on the next token. It has calculated the following probabilities for a small set of possible tokens:
{'over': 0.12, 'the': 0.35, 'a': 0.28, 'under': 0.05, 'quick': 0.20}. If an operator is applied to this set to identify theK=3tokens with the highest probability values, which set of tokens will be returned?Analyzing the Impact of the 'K' Parameter on Token Selection
When generating the next token in a sequence, applying an operator that identifies the
Kitems with the highest values with the parameterKset to 1 will produce a different set of candidate tokens than simply selecting the single token with the highest probability.