1Cademy - Autoregressive Decomposition of the LLM Inference Objective

Learn Before

Mathematical Formulation of the Search Problem in LLM Inference

Formula

Autoregressive Decomposition of the LLM Inference Objective

In large language model inference, the optimal output sequence $\hat{\mathbf{y}}$ is the one that maximizes the conditional log-probability given the input $\mathbf{x}$ . This objective, expressed as finding the argument that maximizes $\log \text{Pr}(\mathbf{y}|\mathbf{x})$ , can be decomposed using the chain rule of probability. The total log-probability of the output sequence is equivalent to the sum of the conditional log-probabilities of each individual token $y_i$ . This is expressed as:

$\hat{\mathbf{y}} = \underset{\mathbf{y}}{\text{arg max}} \log \text{Pr}(\mathbf{y}|\mathbf{x}) = \underset{\mathbf{y}}{\text{arg max}} \sum_{i=1}^{n} \log \text{Pr}(y_i|\mathbf{x}, \mathbf{y}_{<i})$

In this formula, $\mathbf{x}$ represents the entire input sequence and $\mathbf{y}_{<i}$ represents all previously generated output tokens. A more explicit representation of the conditional probability term is $\text{Pr}(y_i|x_0, ..., x_m, y_1, ..., y_{i-1})$ , where the input sequence is $(x_0, ..., x_m)$ and the preceding output is $(y_1, ..., y_{i-1})$ . This formulation is the mathematical basis for autoregressive generation.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After