Constructing the Top-p Candidate Pool
During a text generation step, a language model outputs the following probabilities for the next token from its vocabulary:
P('the') = 0.5 P('a') = 0.2 P('in') = 0.1 P('on') = 0.08 P('at') = 0.07 P('is') = 0.05
The sampling process is configured with a probability threshold 'p' of 0.75. Using standard set notation, what is the candidate pool, , for this step?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
At a specific step 'i' in a text generation process, the model has calculated the following probabilities for the next token from a vocabulary of {A, B, C, D, E}:
P(A) = 0.40 P(B) = 0.30 P(C) = 0.15 P(D) = 0.10 P(E) = 0.05
If the sampling process uses a probability threshold 'p' of 0.8, which of the following sets correctly represents the candidate pool of tokens, denoted as ?
Constructing the Top-p Candidate Pool
A language model's output probabilities for the next token are sorted in descending order. The candidate pool for sampling, represented as , is constructed by including all tokens whose individual probability is greater than the sampling threshold .