Learn Before
Relation

Mathematical Equivalence of General and Sequential MLE Objectives

The general maximum likelihood estimation formulation for a dataset D\mathcal{D} can be re-expressed for sequential data by applying the chain rule of probability. This adaptation decomposes the log-probability of each full sequence x\mathbf{x} into a sum of conditional log-probabilities, thereby demonstrating mathematical equivalence between the standard objective and its autoregressive sequential form: θ^=arg maxθxDlogPrθ(x)=arg maxθxDi=0i1logPrθ(xi+1x0,...,xi)\hat{\theta} = \argmax_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \log \mathrm{Pr}_{\theta}(\mathbf{x}) = \argmax_{\theta} \sum_{\mathbf{x} \in \mathcal{D}} \sum_{i=0}^{i-1} \log \mathrm{Pr}_{\theta}(x_{i+1}|x_0,...,x_{i})

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences