Learn Before
Probability Renormalization Formula for Restricted Vocabulary Sampling
In sampling-based decoding methods like top-k or top-p, after a restricted vocabulary is selected, the probabilities of the tokens within this set are rescaled to form a new, valid probability distribution. The renormalized probability, , of a token is calculated by dividing its original conditional probability by the sum of the original probabilities of all tokens in the restricted set . This is expressed as:

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Top-k Sampling with k=3
Top-k Selection Pool
Probability Renormalization Formula for Restricted Vocabulary Sampling
Probability Renormalization Formula for Top-k Sampling
A language model is generating the next word in a sequence and has calculated the initial probabilities for the five most likely candidates:
the(0.4),a(0.2),one(0.1),his(0.05), andher(0.05). If the model uses a sampling strategy where it only considers the top 3 most likely candidates (k=3), what will be the new, rescaled probability distribution for this reduced set of candidates from which the final word will be sampled?Arrange the following actions into the correct sequence that describes the process of selecting the next token in a text generation model using the top-k sampling method.
Analyzing Text Generation Outputs
Learn After
A language model predicts the probabilities for the next word in a sequence. The top four candidates are: 'happy' (0.4), 'sad' (0.2), 'angry' (0.1), and 'joyful' (0.05). A decoding method is applied that restricts the possible choices to only the top three candidates ('happy', 'sad', 'angry'). After the probabilities for this smaller set are rescaled to form a new, valid probability distribution, what is the new probability for the word 'sad'?
Debugging a Sampling Algorithm
Impact of Vocabulary Set Size on Renormalized Probabilities