Learn Before
Token Selection from Probability Distribution
After a language model computes the probability distribution for the next token, Pr(·|x_0, ..., x_{i-1}), a specific token x_i must be chosen from this distribution. This selection process, also known as decoding or sampling, is a fundamental step in text generation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Token Selection from Probability Distribution
Step-by-Step Example of Auto-Regressive Sequence Generation
Mathematical Formulation of Draft Model Prediction in Speculative Decoding
Iterative Context Update in Autoregressive Generation
Key-Value (KV) Cache in Transformer Inference
Sequential Generation of Output Tokens
Context Shifting in Auto-Regressive Generation
A language model is generating a sentence and has so far produced the sequence:
['The', 'cat', 'sat']. Based on the principles of sequential, one-at-a-time token generation where each new token depends on the ones before it, what is the direct input the model will use to determine the next token in the sequence?A language model generates text by producing a single token at each step, using the entire sequence generated so far as the context for the next token. Arrange the following events in the correct chronological order to illustrate the generation of two new tokens following the initial input 'The ocean is'.
A researcher develops a novel text generation model. Given an input like 'The movie was', instead of generating one token at a time, this model predicts the entire completion (e.g., 'incredibly boring and predictable') in a single, parallel step. Which core principle of the standard auto-regressive process is fundamentally violated by this new model's design?
Learn After
Next Token Prediction Task
Token Sampling from a Conditional Probability Distribution
Using Temperature with Softmax to Control Randomness in Token Selection
A language model is generating text and has produced the sequence 'The sky is'. It then calculates the following probability distribution for the next potential token:
{'blue': 0.75, 'green': 0.15, 'bright': 0.08, 'falling': 0.02}. If the model is configured to always select the single token with the highest probability, which token will it choose next?Analyzing Token Selection Strategies
A language model is generating text and encounters the same input sequence on two separate occasions, producing two different probability distributions for the next token, shown below.
- Distribution A:
{'meal': 0.90, 'dish': 0.05, 'surprise': 0.03, 'error': 0.02} - Distribution B:
{'soup': 0.30, 'stew': 0.25, 'salad': 0.22, 'dessert': 0.23}
Which of the following statements provides the most accurate analysis of these two distributions regarding the token selection process?
- Distribution A:
To ensure the generated text is as coherent and factually accurate as possible, a language model must always select the single token with the highest probability from the distribution at each step of the generation process.