Learn Before
Greedy Decoding vs. Optimal Sequence Probability
A language model is generating a response to the input 'New York is a'. At the first step, the token 'city' has a higher probability than the token 'big'. However, the globally optimal two-word completion is found to be 'big apple'. Explain, using the mathematical objective of inference, how it is possible for a sequence starting with a less probable word ('big') to ultimately have a higher total log-probability than a sequence starting with a more probable word ('city').
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Conditional Probability in Sequence-to-Sequence Generation
Next-Token Probability Calculation in Autoregressive Decoders
Example of Autoregressive Generation and Log-Probability Calculation
An auto-regressive language model is generating text following the input 'The cat sat on the'. The model's objective is to find the output sequence with the highest total log-probability. It is considering two possible two-word continuations:
Path A: 'warm mat'
- log Pr('warm' | 'The cat sat on the') = -0.9
- log Pr('mat' | 'The cat sat on the warm') = -1.5
Path B: 'plush rug'
- log Pr('plush' | 'The cat sat on the') = -1.2
- log Pr('rug' | 'The cat sat on the plush') = -1.1
Based on the provided conditional log-probabilities, which path will the model choose and why?
Debugging a Generation Model's Choice
Greedy Decoding vs. Optimal Sequence Probability
Reconciling Training Log-Likelihood with Inference-Time Sequence Selection
Diagnosing a âHigh-Confidence Wrong Tokenâ Bug in Autoregressive Scoring
Explaining a Counterintuitive Decoding Outcome Using Softmax, Next-Token Conditionals, and Sequence Log-Probability
Auditing a Candidate Completion Using Softmax Next-Token Probabilities and Autoregressive Log-Probability
Investigating a Production Scoring Bug: Softmax Normalization vs. Autoregressive Sequence Log-Probability
Root-Cause Analysis: Why a âMore Likelyâ Token-by-Token Completion Loses on Total Sequence Score
Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs
Your team is building an internal tool that ranks ...
Youâre reviewing an internal evaluation script tha...
Youâre reviewing an internal LLM evaluation pipeli...
Direct Computation of Output Sequence Log-Probability in LLMs