Mathematical Definition of Top-K Token Selection
In beam search, given a parent node corresponding to the sequence prefix , the algorithm identifies the top most probable next tokens from the vocabulary . This selection is mathematically expressed as , where represents the beam width and is a function that evaluates the conditional prediction probabilities of all possible next tokens and retrieves the top candidates.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Construction of Top-K Candidate Sequences in Beam Search
Mathematical Definition of Top-K Token Selection
A language model is generating text using a search algorithm. At a certain step, it has the partial sequence 'The cat sat on the' and calculates the following probabilities for the next word from its vocabulary:
Word Probability mat 0.45 rug 0.25 chair 0.15 floor 0.10 table 0.03 window 0.02 If the algorithm is configured to select the 3 most probable next words at this step, which set of words will be chosen to create new candidate sequences?
Debugging a Text Generation System
A text generation system is designed to explore multiple possible sentence continuations at each step. It does this by selecting a fixed number of the most probable next words from its entire vocabulary. Match each parameter setting or concept with its most likely consequence or definition.
Mathematical Definition of Top-K Token Selection
Formal Derivation of the Top-k Selection Pool
A language model is generating text and needs to decide on the next token. It has calculated the following probabilities for a small set of possible tokens:
{'over': 0.12, 'the': 0.35, 'a': 0.28, 'under': 0.05, 'quick': 0.20}. If an operator is applied to this set to identify theK=3tokens with the highest probability values, which set of tokens will be returned?Analyzing the Impact of the 'K' Parameter on Token Selection
When generating the next token in a sequence, applying an operator that identifies the
Kitems with the highest values with the parameterKset to 1 will produce a different set of candidate tokens than simply selecting the single token with the highest probability.
Learn After
At a certain step in a sequence generation process, the probabilities for the next token over a vocabulary V = {'A', 'B', 'C', 'D', 'E'} are as follows: Pr('A')=0.1, Pr('B')=0.4, Pr('C')=0.05, Pr('D')=0.3, Pr('E')=0.15. If the selection process is defined by the function
argTopKwith K=3, which set of tokens will be selected?Analyzing a Formalism for Token Selection
Construction of Top-K Candidate Sequences in Beam Search
Formula for Constructing Top-K Candidate Sequences
Evaluating a Token Selection Implementation