Learn Before
Activity (Process)

Ranking and Top-p (Nucleus) Sampling Process

Top-p, or Nucleus, Sampling is a decoding strategy for generating text that selects from the smallest possible set of tokens whose cumulative probability exceeds a threshold 'p'. The process involves several steps:

  1. Ranking: All potential next tokens are sorted by their predicted probability in descending order.
  2. Selection (Nucleus Formation): The probabilities of the top-ranked tokens are summed cumulatively until the total meets or exceeds the predefined threshold, p. This set of tokens forms the 'nucleus,' and all other tokens are discarded (pruned).
  3. Renormalization & Sampling: The probabilities of the tokens within the nucleus are rescaled so that they sum to 1. A final token is then randomly sampled from this new, smaller distribution to become the output.

For example, with a vocabulary of {'cute': 0.34, 'on': 0.32, 'sick': 0.21, ...} and a threshold of p = 0.6, the nucleus would be {'cute', 'on'} because their cumulative probability (0.34 + 0.32 = 0.66) is the first to exceed 0.6. Their probabilities would be renormalized to approximately 0.51 and 0.49, respectively, before one is sampled.

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences