Learn Before
Argmax Formula for Next Token Prediction
In the task of next token prediction, a language model determines the most likely subsequent token, , given a preceding context . This is achieved by selecting the token from the entire vocabulary that maximizes the conditional probability output by the model. This selection process is formally expressed as:

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Argmax Formula for Next Token Prediction
An autoregressive language model has just generated the text: 'The sun is shining and the sky is'. Based on the fundamental principle of its operation, what is the immediate next computational step the model must perform to determine the following word?
Characterizing Model Output for Next Token Prediction
An autoregressive language model is given the input sequence 'The cat sat on the'. Arrange the following steps in the correct chronological order that the model follows to generate the very next token.
Learn After
A language model has processed the input sequence 'The sun is shining and the sky is' and must now predict the next word. It computes a probability for several words in its vocabulary. Given the formula
next_word = argmax_{word in Vocabulary} P(word | 'The sun is shining and the sky is')and the following probability outputs, which word will the model select?- P('blue' | context) = 0.85
- P('green' | context) = 0.05
- P('running' | context) = 0.02
- P('0.85' | context) = 0.08
Iterative Application of Argmax for Next Token Prediction
A language model is predicting the next token for the sequence 'The weather is'. It calculates that the probability for the token 'sunny' is 0.78, which is the highest probability for any token in its vocabulary. The selection process is defined by the formula:
predicted_token = argmax_{token in Vocabulary} P(token | 'The weather is'). Based on this information, the output of theargmaxoperation is the numerical value 0.78.Interpreting the Argmax Function in Token Selection
Left-to-Right Token Generation Process