Autoregressive Decomposition of the LLM Inference Objective
In large language model inference, the optimal output sequence is the one that maximizes the conditional log-probability given the input . This objective, expressed as finding the argument that maximizes , can be decomposed using the chain rule of probability. The total log-probability of the output sequence is equivalent to the sum of the conditional log-probabilities of each individual token . This is expressed as:
In this formula, represents the entire input sequence and represents all previously generated output tokens. A more explicit representation of the conditional probability term is , where the input sequence is and the preceding output is . This formulation is the mathematical basis for autoregressive generation.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A language model is generating a response based on a user's input. For this input, the model can generate many different possible sequences of words. The model's core task is to select the single best sequence from all these possibilities. According to the mathematical objective that governs this selection, which principle should the model follow?
Autoregressive Decomposition of the LLM Inference Objective
Optimal Sequence Selection
Search for Optimal Output Sequence in LLMs
Interpreting the LLM Search Objective
Learn After
Mathematical Justification for Greedy Search
A language model needs to compute the total log-probability for generating the specific three-token sequence
y = (y_1, y_2, y_3)given an inputx. Based on the standard autoregressive formulation, which of the following expressions correctly represents this calculation?Calculating Sequence Log-Probability
Analysis of Text Generation Approaches
You’re reviewing an internal evaluation script tha...
Your team is building an internal tool that ranks ...
You’re reviewing an internal LLM evaluation pipeli...
Reconciling Training Log-Likelihood with Inference-Time Sequence Selection
Explaining a Counterintuitive Decoding Outcome Using Softmax, Next-Token Conditionals, and Sequence Log-Probability
Diagnosing a “High-Confidence Wrong Token” Bug in Autoregressive Scoring
Investigating a Production Scoring Bug: Softmax Normalization vs. Autoregressive Sequence Log-Probability
Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs
Root-Cause Analysis: Why a “More Likely” Token-by-Token Completion Loses on Total Sequence Score
Auditing a Candidate Completion Using Softmax Next-Token Probabilities and Autoregressive Log-Probability