Example

Iterative Application of Argmax for Next Token Prediction

The argmax function is applied iteratively to select the most probable next token at each step of sequence generation. For a sequence beginning with the prefix s a\langle s \rangle\ a, the model first predicts the token x2x_2 by maximizing the conditional probability given s a\langle s \rangle\ a. It then uses this new context to predict x3x_3, and so on. This step-by-step process is illustrated by the following sequence of operations:

  1. Predict the second token: arg maxx2VPr(x2s a)\argmax_{x_2 \in V} \Pr(x_{2}|\langle s \rangle\ a)
  2. Predict the third token: arg maxx3VPr(x3s a b)\argmax_{x_3 \in V} \Pr(x_{3}|\langle s \rangle\ a\ b)
  3. Predict the fourth token: arg maxx4VPr(x4s a b c)\argmax_{x_4 \in V} \Pr(x_{4}|\langle s \rangle\ a\ b\ c)

This iterative selection, where each new token is chosen by maximizing its conditional probability based on the preceding context, is a core mechanism of greedy decoding in autoregressive models.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences