Learn Before
Next Token Prediction Task
When applying a trained language model, a common and fundamental task is next token prediction, which involves finding the most likely token given its sequence of previous context tokens. At each step, the model computes a probability distribution over the entire vocabulary conditioned on the preceding context. This token prediction task sequentially utilizes these computed probability distributions to select the most probable next token and continue the sequence.

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Next Token Prediction Task
Token Sampling from a Conditional Probability Distribution
Using Temperature with Softmax to Control Randomness in Token Selection
A language model is generating text and has produced the sequence 'The sky is'. It then calculates the following probability distribution for the next potential token:
{'blue': 0.75, 'green': 0.15, 'bright': 0.08, 'falling': 0.02}. If the model is configured to always select the single token with the highest probability, which token will it choose next?Analyzing Token Selection Strategies
A language model is generating text and encounters the same input sequence on two separate occasions, producing two different probability distributions for the next token, shown below.
- Distribution A:
{'meal': 0.90, 'dish': 0.05, 'surprise': 0.03, 'error': 0.02} - Distribution B:
{'soup': 0.30, 'stew': 0.25, 'salad': 0.22, 'dessert': 0.23}
Which of the following statements provides the most accurate analysis of these two distributions regarding the token selection process?
- Distribution A:
To ensure the generated text is as coherent and factually accurate as possible, a language model must always select the single token with the highest probability from the distribution at each step of the generation process.
Learn After
Argmax Formula for Next Token Prediction
An autoregressive language model has just generated the text: 'The sun is shining and the sky is'. Based on the fundamental principle of its operation, what is the immediate next computational step the model must perform to determine the following word?
Characterizing Model Output for Next Token Prediction
An autoregressive language model is given the input sequence 'The cat sat on the'. Arrange the following steps in the correct chronological order that the model follows to generate the very next token.