Learn Before
Formal Derivation of the Top-k Selection Pool
The selection pool in top-k sampling, , is formally derived by identifying the tokens with the highest conditional probabilities from the entire vocabulary at each generation step . This selection process is formalized using the argTopK function, which ranks the prediction probabilities of all possible next tokens and returns the top . The resulting selection pool is thus defined as: where the probability is conditioned on the input x and the preceding token sequence y_{<i}.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formal Derivation of the Top-k Selection Pool
A language model is generating text and has calculated the following probabilities for the next potential token:
{'the': 0.45, 'a': 0.20, 'cat': 0.12, 'dog': 0.08, 'ran': 0.07, 'jumped': 0.05}. If the model is configured to sample its next choice from only the 4 most likely candidates, which set of tokens constitutes the selection pool?Impact of Selection Pool Size on Text Generation
When generating text by sampling from a pool of the most probable candidate tokens, setting the pool size to 1 will produce the exact same output sequence as a method that always deterministically chooses the single token with the highest probability at every step.
Mathematical Definition of Top-K Token Selection
Formal Derivation of the Top-k Selection Pool
A language model is generating text and needs to decide on the next token. It has calculated the following probabilities for a small set of possible tokens:
{'over': 0.12, 'the': 0.35, 'a': 0.28, 'under': 0.05, 'quick': 0.20}. If an operator is applied to this set to identify theK=3tokens with the highest probability values, which set of tokens will be returned?Analyzing the Impact of the 'K' Parameter on Token Selection
When generating the next token in a sequence, applying an operator that identifies the
Kitems with the highest values with the parameterKset to 1 will produce a different set of candidate tokens than simply selecting the single token with the highest probability.
Learn After
A language model is generating a sequence. At a specific step
i, it computes the following probabilities for the next token over its vocabulary V = {'run', 'walk', 'jump', 'sit', 'sleep'}. Given a setting of K=3, which of the following sets correctly represents the selection poolV_iaccording to the formal definition:Probabilities:
- Pr('run' | ...) = 0.15
- Pr('walk' | ...) = 0.40
- Pr('jump' | ...) = 0.05
- Pr('sit' | ...) = 0.35
- Pr('sleep' | ...) = 0.05
A developer is implementing the selection mechanism for a text generation model based on the formal definition: For a vocabulary V = {'cat', 'dog', 'ran', 'sat'} and K=2, the model computes the next-token probabilities as: Pr('cat'|...) = 0.1, Pr('dog'|...) = 0.5, Pr('ran'|...) = 0.3, Pr('sat'|...) = 0.1. The developer's code returns the set {0.5, 0.3} as the selection pool . What is the fundamental error in this output when compared to the formal definition?
Interpreting the Formal Definition of Top-k Selection