Learn Before
Interpreting the Argmax Function in Token Selection
A language model uses the formula predicted_token = argmax_{token ∈ V} P(token | context) to select the next token. Explain the role of each component of this formula (argmax, token ∈ V, and P(token | context)) and describe what the final output of the entire operation represents.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model has processed the input sequence 'The sun is shining and the sky is' and must now predict the next word. It computes a probability for several words in its vocabulary. Given the formula
next_word = argmax_{word in Vocabulary} P(word | 'The sun is shining and the sky is')and the following probability outputs, which word will the model select?- P('blue' | context) = 0.85
- P('green' | context) = 0.05
- P('running' | context) = 0.02
- P('0.85' | context) = 0.08
Iterative Application of Argmax for Next Token Prediction
A language model is predicting the next token for the sequence 'The weather is'. It calculates that the probability for the token 'sunny' is 0.78, which is the highest probability for any token in its vocabulary. The selection process is defined by the formula:
predicted_token = argmax_{token in Vocabulary} P(token | 'The weather is'). Based on this information, the output of theargmaxoperation is the numerical value 0.78.Interpreting the Argmax Function in Token Selection
Left-to-Right Token Generation Process