1Cademy - Reconciling Training Log-Likelihood with Inference-Time Sequence Selection

Learn Before

Essay

Reconciling Training Log-Likelihood with Inference-Time Sequence Selection

You are reviewing an internal incident report: a product team claims their LLM “should have generated” a particular 3-token continuation y = (y1, y2, y3) after a prompt x because, at each step, the model assigned the highest next-token probability to the token that appears in that continuation. Another team counters that the correct inference objective is to choose the continuation that maximizes the conditional probability of the entire sequence given x, and that this can disagree with stepwise top-1 choices.

Write an analysis that (1) states the mathematical inference objective for selecting an output sequence given x, (2) decomposes that objective autoregressively into next-token conditional probabilities, (3) explains how the model obtains each next-token probability from logits using softmax, and (4) connects this to the training objective by explaining how maximizing log-likelihood over data relates to (but does not guarantee) greedy stepwise selection at inference. Your answer should explicitly use log-probabilities (sum of logs) to justify why “highest at each step” is not the same claim as “highest total sequence probability,” and should include at least one concrete numeric mini-example (you may invent numbers) showing how two different 3-token continuations can lead to this disagreement.

Updated 2026-02-06

Contributors are:

Who are from:

Learn Before

Related