Learn Before
Ranking and Top-p (Nucleus) Sampling Process
Top-p, or Nucleus, Sampling is a decoding strategy for generating text that selects from the smallest possible set of tokens whose cumulative probability exceeds a threshold 'p'. The process involves several steps:
- Ranking: All potential next tokens are sorted by their predicted probability in descending order.
- Selection (Nucleus Formation): The probabilities of the top-ranked tokens are summed cumulatively until the total meets or exceeds the predefined threshold,
p. This set of tokens forms the 'nucleus,' and all other tokens are discarded (pruned). - Renormalization & Sampling: The probabilities of the tokens within the nucleus are rescaled so that they sum to 1. A final token is then randomly sampled from this new, smaller distribution to become the output.
For example, with a vocabulary of {'cute': 0.34, 'on': 0.32, 'sick': 0.21, ...} and a threshold of p = 0.6, the nucleus would be {'cute', 'on'} because their cumulative probability (0.34 + 0.32 = 0.66) is the first to exceed 0.6. Their probabilities would be renormalized to approximately 0.51 and 0.49, respectively, before one is sampled.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Ranking and Top-p (Nucleus) Sampling Process
Comparison of Top-p and Top-k Sampling
A language model is generating text and has calculated the following probabilities for the next potential token:
{'the': 0.40, 'a': 0.30, 'one': 0.15, 'an': 0.10, 'some': 0.05}. If the model uses a sampling method where it selects from the smallest set of the most likely tokens whose cumulative probability exceeds a threshold ofp = 0.75, which set of tokens will it sample from?Effect of Parameter 'p' on Text Generation
Dynamic Candidate Set in Probabilistic Text Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Balancing Randomness and Coherence in Token Sampling
Using Temperature with Softmax to Control Randomness in Token Selection
Learn After
Candidate Pool Size in Top-p Sampling (kp)
Forming the Candidate Pool in Top-p Sampling
A language model is generating text and has calculated the following probabilities for the next possible token: 'the' (0.45), 'a' (0.25), 'one' (0.15), 'it' (0.10), 'she' (0.05). If the model uses a sampling strategy with a probability threshold of
p = 0.8, which set of tokens will form the final candidate pool (the 'nucleus') from which the next token is actually sampled?A language model is configured to generate text by sampling from the smallest set of tokens whose cumulative probability exceeds a predefined threshold 'p'. Arrange the following steps of this process in the correct chronological order.
Applying the Top-p Sampling Process