Learn Before
Probability Renormalization Formula for Top-k Sampling
In top-k sampling, after identifying the pool of the k most probable tokens (), their probabilities are renormalized to form a new distribution that sums to 1. The renormalized probability of a token from this pool is calculated by dividing its original probability by the sum of the original probabilities of all tokens in the pool: This ensures that the new probabilities for the tokens in sum to 1.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Top-k Sampling with k=3
Top-k Selection Pool
Probability Renormalization Formula for Restricted Vocabulary Sampling
Probability Renormalization Formula for Top-k Sampling
A language model is generating the next word in a sequence and has calculated the initial probabilities for the five most likely candidates:
the(0.4),a(0.2),one(0.1),his(0.05), andher(0.05). If the model uses a sampling strategy where it only considers the top 3 most likely candidates (k=3), what will be the new, rescaled probability distribution for this reduced set of candidates from which the final word will be sampled?Arrange the following actions into the correct sequence that describes the process of selecting the next token in a text generation model using the top-k sampling method.
Analyzing Text Generation Outputs
Learn After
A language model predicts the next token and assigns the following probabilities to the most likely candidates: 'the' (0.4), 'a' (0.2), 'one' (0.1), and 'some' (0.05). If the model is configured to only consider the top 3 most probable tokens for the next step, what is the adjusted probability for the token 'a' after the probabilities are recalculated to sum to 1?
Calculating Renormalized Probability
True or False: When a model identifies a small group of the most likely next words and then recalculates their probabilities so that they sum to 1, the new, recalculated probability for any given word in that group will always be greater than or equal to its original probability.