Mathematical Formulation of LLM Inference
The inference process for Large Language Models is mathematically defined as identifying the most probable output sequence based on a given input context . This involves determining the sequence that maximizes the conditional log-probability: . To account for the step-by-step nature of text generation, this equation calculates the sum of the log-probabilities for predicting each individual token starting from position , rather than position . Each token's probability is conditioned on the initial context sequence () and all prior generated tokens (): .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Mathematical Formulation of LLM Inference
Single-Round Prediction Problem
Token-Level Representation of Input and Output Sequences for a Forward Pass
Multi-Round Prediction Problem
Notation for Concatenated Token Sequences
A language model is given an input sequence of tokens representing the phrase 'The best way to learn a new skill is'. The model then calculates the likelihood for several possible completing sequences. Based on the formal objective of the text generation process, which of the following sequences should the model select to output?
Analyzing Model Output Selection
A language model is given an input context
x. It then evaluates two potential output sequences,y_1andy_2. The model's internal calculations determine thaty_1has a higher probability of occurring afterxthany_2. However, a human evaluator findsy_2to be more creative and detailed. According to the formal objective of the text generation process, what should the model do?Mathematical Formulation of LLM Inference
Equivalence of Maximizing Auto-regressive Log-Likelihood and Minimizing Cross-Entropy Loss
Conditional vs. Joint Probability Objectives in Language Modeling
Notational Convention for Autoregressive Conditional Probability
Modeling and Efficient Computation of Conditional Token Probabilities
A language model is generating a response sequence 'y' given an input context 'x'. The model generates the two-token sequence y = ('deep', 'learning'). The model's calculated log-probabilities for each step of the generation are as follows:
- Log-probability of the first token:
log Pr(y₁='deep' | x) = -0.7 - Log-probability of the second token, given the first:
log Pr(y₂='learning' | x, y₁='deep') = -0.4
Based on the standard method for calculating the probability of a full sequence, what is the total conditional log-likelihood of the entire sequence 'y', i.e.,
log Pr(y|x)?- Log-probability of the first token:
Comparing Model Confidence via Log-Likelihood
Analyzing a Flawed Log-Likelihood Calculation
Learn After
Conditional Probability in Sequence-to-Sequence Generation
Next-Token Probability Calculation in Autoregressive Decoders
Example of Autoregressive Generation and Log-Probability Calculation
An auto-regressive language model is generating text following the input 'The cat sat on the'. The model's objective is to find the output sequence with the highest total log-probability. It is considering two possible two-word continuations:
Path A: 'warm mat'
- log Pr('warm' | 'The cat sat on the') = -0.9
- log Pr('mat' | 'The cat sat on the warm') = -1.5
Path B: 'plush rug'
- log Pr('plush' | 'The cat sat on the') = -1.2
- log Pr('rug' | 'The cat sat on the plush') = -1.1
Based on the provided conditional log-probabilities, which path will the model choose and why?
Debugging a Generation Model's Choice
Greedy Decoding vs. Optimal Sequence Probability
Reconciling Training Log-Likelihood with Inference-Time Sequence Selection
Diagnosing a “High-Confidence Wrong Token” Bug in Autoregressive Scoring
Explaining a Counterintuitive Decoding Outcome Using Softmax, Next-Token Conditionals, and Sequence Log-Probability
Auditing a Candidate Completion Using Softmax Next-Token Probabilities and Autoregressive Log-Probability
Investigating a Production Scoring Bug: Softmax Normalization vs. Autoregressive Sequence Log-Probability
Root-Cause Analysis: Why a “More Likely” Token-by-Token Completion Loses on Total Sequence Score
Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs
Your team is building an internal tool that ranks ...
You’re reviewing an internal evaluation script tha...
You’re reviewing an internal LLM evaluation pipeli...
Direct Computation of Output Sequence Log-Probability in LLMs