Learn Before
Iterative Application of Argmax for Next Token Prediction
The argmax function is applied iteratively to select the most probable next token at each step of sequence generation. For a sequence beginning with the prefix , the model first predicts the token by maximizing the conditional probability given . It then uses this new context to predict , and so on. This step-by-step process is illustrated by the following sequence of operations:
- Predict the second token:
- Predict the third token:
- Predict the fourth token:
This iterative selection, where each new token is chosen by maximizing its conditional probability based on the preceding context, is a core mechanism of greedy decoding in autoregressive models.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A language model has processed the input sequence 'The sun is shining and the sky is' and must now predict the next word. It computes a probability for several words in its vocabulary. Given the formula
next_word = argmax_{word in Vocabulary} P(word | 'The sun is shining and the sky is')and the following probability outputs, which word will the model select?- P('blue' | context) = 0.85
- P('green' | context) = 0.05
- P('running' | context) = 0.02
- P('0.85' | context) = 0.08
Iterative Application of Argmax for Next Token Prediction
A language model is predicting the next token for the sequence 'The weather is'. It calculates that the probability for the token 'sunny' is 0.78, which is the highest probability for any token in its vocabulary. The selection process is defined by the formula:
predicted_token = argmax_{token in Vocabulary} P(token | 'The weather is'). Based on this information, the output of theargmaxoperation is the numerical value 0.78.Interpreting the Argmax Function in Token Selection
Left-to-Right Token Generation Process
Learn After
An autoregressive language model generates text one token at a time. At each step, it chooses the single token with the highest conditional probability based on the entire sequence generated so far. The model starts with the context 'The dog' and must choose the next two tokens.
Given the following table of conditional probabilities, which sequence of two tokens will the model generate?
Current Context Next Token Probability 'The dog' 'barked' 0.7 'The dog' 'ran' 0.2 'The dog' 'ate' 0.1 'The dog barked' 'loudly' 0.9 'The dog barked' 'at' 0.1 'The dog ran' 'away' 0.6 'The dog ran' 'to' 0.4 An autoregressive model generates a sequence of three tokens after an initial start token,
<s>. It does this by selecting the single most probable token at each step based on the sequence generated so far. Arrange the following actions into the correct chronological order that the model follows.Analyzing Suboptimal Text Generation