1Cademy - Top-p (Nucleus) Sampling Process

Learn Before

1. Flowcharts

Activity (Process)

Top-p (Nucleus) Sampling Process

Top-p, or nucleus, sampling is a probabilistic text generation technique that involves a multi-stage process. First, in the expansion stage, all potential next tokens are generated. Second, these tokens are ranked by probability. Third, a 'nucleus' of the top-ranked tokens is selected, such that their cumulative probability exceeds a predefined threshold 'p'. The probabilities within this nucleus are then renormalized. Finally, a single token is sampled from this renormalized set to become the output. This method balances quality and diversity by filtering out the long tail of low-probability tokens.