1Cademy - Mathematical Definition of Top-K Token Selection

Learn Before

Top-K Token Selection in Beam Search
argTopK Function

Formula

Mathematical Definition of Top-K Token Selection

In beam search, given a parent node corresponding to the sequence prefix $y_1...y_{i-1}$ , the algorithm identifies the top $K$ most probable next tokens from the vocabulary $V$ . This selection is mathematically expressed as $\left\{y_i^{\mathrm{top}1},...,y_i^{\mathrm{top}K}\right\} = \mathop{\mathrm{argTopK}}_{y_i \in V} \Pr(y_i|\mathbf{x},\mathbf{y}_{<i})$ , where $K$ represents the beam width and $\mathrm{argTopK}$ is a function that evaluates the conditional prediction probabilities of all possible next tokens and retrieves the top $K$ candidates.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Word	Probability
mat	0.45
rug	0.25
chair	0.15
floor	0.10
table	0.03
window	0.02

Learn After

At a certain step in a sequence generation process, the probabilities for the next token over a vocabulary V = {'A', 'B', 'C', 'D', 'E'} are as follows: Pr('A')=0.1, Pr('B')=0.4, Pr('C')=0.05, Pr('D')=0.3, Pr('E')=0.15. If the selection process is defined by the function argTopK with K=3, which set of tokens will be selected?
Analyzing a Formalism for Token Selection
Construction of Top-K Candidate Sequences in Beam Search
Formula for Constructing Top-K Candidate Sequences
Evaluating a Token Selection Implementation

Learn Before

Related

Learn After