Formula

Mathematical Definition of Top-K Token Selection

In beam search, given a parent node corresponding to the sequence prefix y1...yi1y_1...y_{i-1}, the algorithm identifies the top KK most probable next tokens from the vocabulary VV. This selection is mathematically expressed as {yitop1,...,yitopK}=argTopKyiV  Pr(yix,y<i)\left\{y_i^{\mathrm{top}1},...,y_i^{\mathrm{top}K}\right\} = \mathop{\mathrm{argTopK}}_{y_i \in V}\ \ \Pr(y_i|\mathbf{x},\mathbf{y}_{<i}), where KK represents the beam width and argTopK\mathrm{argTopK} is a function that evaluates the conditional prediction probabilities of all possible next tokens and retrieves the top KK candidates.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related