Conditional Probability in Sequence-to-Sequence Generation
In sequence-to-sequence models, the probability of generating a specific output token is conditioned on both the entire input sequence and all previously generated output tokens. This is represented by the formula , where is the current output token, is the complete input sequence, and are the output tokens already generated. This conditional probability is the core calculation performed at each step of the auto-regressive generation process.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Conditional Probability in Sequence-to-Sequence Generation
Next-Token Probability Calculation in Autoregressive Decoders
Example of Autoregressive Generation and Log-Probability Calculation
An auto-regressive language model is generating text following the input 'The cat sat on the'. The model's objective is to find the output sequence with the highest total log-probability. It is considering two possible two-word continuations:
Path A: 'warm mat'
- log Pr('warm' | 'The cat sat on the') = -0.9
- log Pr('mat' | 'The cat sat on the warm') = -1.5
Path B: 'plush rug'
- log Pr('plush' | 'The cat sat on the') = -1.2
- log Pr('rug' | 'The cat sat on the plush') = -1.1
Based on the provided conditional log-probabilities, which path will the model choose and why?
Debugging a Generation Model's Choice
Greedy Decoding vs. Optimal Sequence Probability
Reconciling Training Log-Likelihood with Inference-Time Sequence Selection
Diagnosing a “High-Confidence Wrong Token” Bug in Autoregressive Scoring
Explaining a Counterintuitive Decoding Outcome Using Softmax, Next-Token Conditionals, and Sequence Log-Probability
Auditing a Candidate Completion Using Softmax Next-Token Probabilities and Autoregressive Log-Probability
Investigating a Production Scoring Bug: Softmax Normalization vs. Autoregressive Sequence Log-Probability
Root-Cause Analysis: Why a “More Likely” Token-by-Token Completion Loses on Total Sequence Score
Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs
Your team is building an internal tool that ranks ...
You’re reviewing an internal evaluation script tha...
You’re reviewing an internal LLM evaluation pipeli...
Direct Computation of Output Sequence Log-Probability in LLMs
A language model is generating text and has so far produced the sequence 'The sky is'. The model now needs to calculate the likelihood of the next word being 'blue'. Which of the following mathematical expressions correctly represents the probability of the next word being 'blue', given the preceding words?
Conditional Probability in Sequence-to-Sequence Generation
Notation for Machine Translation Probability
Formula for Re-weighting a Probability Distribution with a Reward Function
Applying Conditional Probability Notation in Text Summarization
Learn After
A language model generates an output sequence one token at a time, where each new token's probability depends on prior information. If the model has already produced the first three tokens of an output based on a given input sequence, which of the following best describes the complete set of information used to calculate the probability for the fourth token?
Analyzing Generation Processes
Analyzing a Translation Model's Error