1Cademy - Forming the Candidate Pool in Top-p Sampling

Learn Before

Ranking and Top-p (Nucleus) Sampling Process

Activity (Process)

Forming the Candidate Pool in Top-p Sampling

The candidate pool for top-p sampling is created through a two-step process. First, all potential next tokens are sorted by their predicted probabilities in descending order. Second, starting with the highest-probability token, tokens are cumulatively added to the pool until their combined probability meets or exceeds the predefined threshold 'p'.

Updated 2026-05-05

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A language model is generating the next word and has calculated the following probabilities for the most likely tokens: {'the': 0.40, 'a': 0.25, 'one': 0.15, 'it': 0.10, 'is': 0.05}. If the model uses a probability threshold of p = 0.70 to create a candidate pool for sampling, which set of tokens will be included in that pool?
A text generation model needs to create a candidate pool of tokens for its next selection based on a cumulative probability threshold. Arrange the following actions in the correct chronological order to accurately construct this pool.
Determining the Probability Threshold

Learn Before

Related

Learn After