Example of Autoregressive Generation and Log-Probability Calculation
This example illustrates how an autoregressive model generates the sentence 'cats are playful.' by following a specific path through a search space (e.g., node 0 → 3 → 9 → 11 → 17). The overall probability of this generated sequence is calculated by summing the conditional log-probabilities of each token. The calculation unfolds sequentially as follows:
log Pr("cats"|x)log Pr("are"|x, "cats")log Pr("playful"|x, "cats are")log Pr("."|x, "cats are playful")Each term represents the log-probability of generating the current token, given the inputxand all previously generated tokens in the sequence.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Conditional Probability in Sequence-to-Sequence Generation
Next-Token Probability Calculation in Autoregressive Decoders
Example of Autoregressive Generation and Log-Probability Calculation
An auto-regressive language model is generating text following the input 'The cat sat on the'. The model's objective is to find the output sequence with the highest total log-probability. It is considering two possible two-word continuations:
Path A: 'warm mat'
- log Pr('warm' | 'The cat sat on the') = -0.9
- log Pr('mat' | 'The cat sat on the warm') = -1.5
Path B: 'plush rug'
- log Pr('plush' | 'The cat sat on the') = -1.2
- log Pr('rug' | 'The cat sat on the plush') = -1.1
Based on the provided conditional log-probabilities, which path will the model choose and why?
Debugging a Generation Model's Choice
Greedy Decoding vs. Optimal Sequence Probability
Reconciling Training Log-Likelihood with Inference-Time Sequence Selection
Diagnosing a “High-Confidence Wrong Token” Bug in Autoregressive Scoring
Explaining a Counterintuitive Decoding Outcome Using Softmax, Next-Token Conditionals, and Sequence Log-Probability
Auditing a Candidate Completion Using Softmax Next-Token Probabilities and Autoregressive Log-Probability
Investigating a Production Scoring Bug: Softmax Normalization vs. Autoregressive Sequence Log-Probability
Root-Cause Analysis: Why a “More Likely” Token-by-Token Completion Loses on Total Sequence Score
Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs
Your team is building an internal tool that ranks ...
You’re reviewing an internal evaluation script tha...
You’re reviewing an internal LLM evaluation pipeli...
Direct Computation of Output Sequence Log-Probability in LLMs
Incremental Calculation of Sequence Log-Probability
Example of Autoregressive Generation and Log-Probability Calculation
A language model is generating a continuation for the input 'The best way to learn a new skill is'. It has produced two candidate sequences and calculated their total log-probabilities as follows:
- Sequence A: '...by practicing consistently.' (Total log-probability = -1.15)
- Sequence B: '...through osmotic absorption.' (Total log-probability = -7.82)
Based on these values, which sequence is considered more plausible by the model, and why?
When a language model evaluates different possible output sequences, why is it standard practice to sum their log-probabilities instead of multiplying their raw probabilities?
A language model has generated the sequence 'The sun is' with a cumulative log-probability of -2.5. The model is now considering the next token. Given the following conditional log-probabilities for the next token, which choice would result in the most probable three-word sequence?
Learn After
An autoregressive language model is generating a sequence and must choose between two possible next phrases to complete the input 'The mountain path was...'. The model has calculated the conditional log-probabilities for the tokens in each potential phrase as follows:
Phrase A: 'steep and rocky'
log Pr('steep' | 'The mountain path was') = -0.9log Pr('and' | 'The mountain path was steep') = -1.5log Pr('rocky' | 'The mountain path was steep and') = -0.7
Phrase B: 'long but scenic'
log Pr('long' | 'The mountain path was') = -1.2log Pr('but' | 'The mountain path was long') = -1.3log Pr('scenic' | 'The mountain path was long but') = -0.4
Based on these values, which phrase is the model more likely to generate, and what is its total log-probability?
Analyzing Suboptimal Autoregressive Generation
An autoregressive language model must calculate the total log-probability for the generated sentence: "The sky is blue." The process involves summing the conditional log-probabilities of each token in sequence. Below are four potential breakdowns of this calculation. Which one correctly represents the sequence of calculations the model must perform?