Learn Before
Top-p (Nucleus) Sampling
Top-p sampling, also known as nucleus sampling, is a decoding method that selects the next token from a dynamically sized candidate pool. This pool is formed by identifying the smallest set of the most probable tokens whose cumulative probability exceeds a predefined threshold 'p' [Holtzman et al., 2020]. By constructing the candidate pool in this manner, the method avoids selecting low-probability tokens from the long tail of the distribution, which helps prevent the generation of incoherent or nonsensical text.

0
1
Tags
Data Science
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Top-k Sampling
Top-p (Nucleus) Sampling
A team developing a language model for creative storytelling finds that its generated text is often repetitive and predictable, frequently getting stuck in loops (e.g., 'I am I am I am...'). Which of the following decoding strategies would be most effective at addressing this issue by introducing more variety into the generated text?
Analyzing Text Generation Outputs
Comparing Text Generation Strategies
When using a stochastic decoding method for text generation, the model is guaranteed to select the single token with the highest probability at each step.
Learn After
Ranking and Top-p (Nucleus) Sampling Process
Comparison of Top-p and Top-k Sampling
A language model is generating text and has calculated the following probabilities for the next potential token:
{'the': 0.40, 'a': 0.30, 'one': 0.15, 'an': 0.10, 'some': 0.05}. If the model uses a sampling method where it selects from the smallest set of the most likely tokens whose cumulative probability exceeds a threshold ofp = 0.75, which set of tokens will it sample from?Effect of Parameter 'p' on Text Generation
Dynamic Candidate Set in Probabilistic Text Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Balancing Randomness and Coherence in Token Sampling
Using Temperature with Softmax to Control Randomness in Token Selection