Learn Before
Activity (Process)

Top-k Sampling Process

Top-k sampling is a decoding method for selecting the next token in a sequence by sampling from a reduced set of the most likely options. The process consists of several steps:

  1. Ranking: All potential next tokens from the vocabulary are ranked according to their predicted probabilities.
  2. Selection (Top-k): The vocabulary is truncated to include only the 'k' tokens with the highest probabilities. All other lower-probability tokens are discarded or 'pruned'. For example, if k=3, only the top three candidates are kept.
  3. Renormalization & Sampling: The probabilities of the selected top-k tokens are recalculated (renormalized) to sum to 1. A final token is then chosen by sampling from this new, smaller probability distribution. This introduces randomness among the most plausible choices. For instance, after ranking, the top 3 tokens might be 'cute' (Pr=0.34), 'on' (Pr=0.32), and 'sick' (Pr=0.21). After renormalization, their probabilities might become 'cute' (Pr=0.39), 'on' (Pr=0.36), and 'sick' (Pr=0.25). Sampling from this new distribution might then select 'on' as the final output.
Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related