1Cademy - Argmax Formula for Next Token Prediction

Learn Before

Next Token Prediction Task

Formula

Argmax Formula for Next Token Prediction

In the task of next token prediction, a language model determines the most likely subsequent token, $\hat{x}_i$ , given a preceding context $x_0,...,x_{i-1}$ . This is achieved by selecting the token from the entire vocabulary $\mathcal{V}$ that maximizes the conditional probability output by the model. This selection process is formally expressed as:

$\hat{x}_i = \argmax_{x_i \in \mathcal{V}} \Pr(x_{i}|x_0,...,x_{i-1})$

Updated 2026-06-20

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A language model has processed the input sequence 'The sun is shining and the sky is' and must now predict the next word. It computes a probability for several words in its vocabulary. Given the formula next_word = argmax_{word in Vocabulary} P(word | 'The sun is shining and the sky is') and the following probability outputs, which word will the model select?
- P('blue' | context) = 0.85
- P('green' | context) = 0.05
- P('running' | context) = 0.02
- P('0.85' | context) = 0.08
Iterative Application of Argmax for Next Token Prediction
A language model is predicting the next token for the sequence 'The weather is'. It calculates that the probability for the token 'sunny' is 0.78, which is the highest probability for any token in its vocabulary. The selection process is defined by the formula: predicted_token = argmax_{token in Vocabulary} P(token | 'The weather is'). Based on this information, the output of the argmax operation is the numerical value 0.78.
Interpreting the Argmax Function in Token Selection
Left-to-Right Token Generation Process
Next Token Prediction Formula Using KV Cache

Learn Before

Related

Learn After