Step-by-Step Example of Auto-Regressive Sequence Generation
An auto-regressive language model generates text one token at a time, where each new token is predicted based on the sequence of tokens that came before it. The overall probability of the generated sequence is calculated by multiplying the conditional probabilities of each token. The following table illustrates this process for generating the three tokens , , and given the prefix :
| Context | Predicted Token | Decision Rule | Cumulative Sequence Probability |
|---|---|---|---|
| $$\argmax_{x_2 \in V} \Pr(x_{2} | \langle s \rangle\ a)$$ | ||
| $$\argmax_{x_3 \in V} \Pr(x_{3} | \langle s \rangle\ a\ b)$$ | ||
| $$\argmax_{x_4 \in V} \Pr(x_{4} | \langle s \rangle\ a\ b\ c)$$ |
At each step, the model selects a token from the vocabulary so that the conditional probability is maximized. This predicted token is then appended to the end of the context sequence for the next step.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Probability Factorization for Arbitrary Order Token Prediction
Step-by-Step Example of Auto-Regressive Sequence Generation
Standard Auto-Regressive Probability Factorization using Embeddings
A language model is designed to calculate the likelihood of a text sequence by predicting each token based only on the tokens that have come before it. Given the three-token sequence 'The quick brown', which of the following expressions correctly represents how this model would calculate the total probability of the entire sequence?
Example of Auto-Regressive Probability Calculation
Calculating Sequence Probability in an Auto-regressive Model
Debugging a Sequence Probability Calculation
Token Selection from Probability Distribution
Step-by-Step Example of Auto-Regressive Sequence Generation
Mathematical Formulation of Draft Model Prediction in Speculative Decoding
Iterative Context Update in Autoregressive Generation
Key-Value (KV) Cache in Transformer Inference
Sequential Generation of Output Tokens
Context Shifting in Auto-Regressive Generation
A language model is generating a sentence and has so far produced the sequence:
['The', 'cat', 'sat']. Based on the principles of sequential, one-at-a-time token generation where each new token depends on the ones before it, what is the direct input the model will use to determine the next token in the sequence?A language model generates text by producing a single token at each step, using the entire sequence generated so far as the context for the next token. Arrange the following events in the correct chronological order to illustrate the generation of two new tokens following the initial input 'The ocean is'.
A researcher develops a novel text generation model. Given an input like 'The movie was', instead of generating one token at a time, this model predicts the entire completion (e.g., 'incredibly boring and predictable') in a single, parallel step. Which core principle of the standard auto-regressive process is fundamentally violated by this new model's design?
Learn After
An auto-regressive language model using a greedy decision rule has generated the sequence 'The cat sat' with a cumulative probability of 0.2. At the next step, the model calculates the following conditional probabilities for the next token: P('on' | 'The cat sat') = 0.6, P('by' | 'The cat sat') = 0.3, and P('under' | 'The cat sat') = 0.1. What will be the newly generated sequence and its updated cumulative probability?
A language model is tasked with generating the three-token sequence 'The quick brown' using a greedy, auto-regressive approach. Arrange the following actions in the correct chronological order that the model would take.
Calculating Conditional Probability in a Generation Step