1Cademy - Constructing the Top-p Candidate Pool

Learn Before

Mathematical Representation of the Top-p Candidate Pool

Short Answer

Constructing the Top-p Candidate Pool

During a text generation step, a language model outputs the following probabilities for the next token from its vocabulary:

P('the') = 0.5 P('a') = 0.2 P('in') = 0.1 P('on') = 0.08 P('at') = 0.07 P('is') = 0.05

The sampling process is configured with a probability threshold 'p' of 0.75. Using standard set notation, what is the candidate pool, $\overline{V}_i$ , for this step?

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

At a specific step 'i' in a text generation process, the model has calculated the following probabilities for the next token from a vocabulary of {A, B, C, D, E}:

P(A) = 0.40 P(B) = 0.30 P(C) = 0.15 P(D) = 0.10 P(E) = 0.05

If the sampling process uses a probability threshold 'p' of 0.8, which of the following sets correctly represents the candidate pool of tokens, denoted as $\overline{V}_i$ ?
Constructing the Top-p Candidate Pool
A language model's output probabilities for the next token are sorted in descending order. The candidate pool for sampling, represented as $\overline{V}_i = \{y_i^{\text{top1}}, \dots, y_i^{\text{topk}_p}\}$ , is constructed by including all tokens whose individual probability is greater than the sampling threshold $p$ .

Learn Before

Related