Learn Before
Mathematical Equivalence of General and Sequential MLE Objectives
The general maximum likelihood estimation formulation for a dataset can be re-expressed for sequential data by applying the chain rule of probability. This adaptation decomposes the log-probability of each full sequence into a sum of conditional log-probabilities, thereby demonstrating mathematical equivalence between the standard objective and its autoregressive sequential form:

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Relationship between KL Divergence and MLE
Cross-entropy loss
Mean Squared Error
The property of consistency of maximum likelihood
Statistical Efficiency Principal of MLE
Maximum Likelihood Estimator Properties
Log-Likelihood Gradient
Maximum Likelihood Training Objective for a Dataset of Sequences
Kullback-Leibler Divergence
Model Selection via Likelihood
Training Objective as Loss Minimization over a Dataset
Mathematical Equivalence of General and Sequential MLE Objectives
A researcher is modeling a series of coin flips. They observe the following sequence of outcomes: Heads, Tails, Heads, Heads. The researcher wants to find the best parameter for their model, where the parameter represents the probability of the coin landing on Heads. According to the principle of maximum likelihood estimation, which of the following parameter values best explains the observed data?
Parameter Estimation via Conditional Log-Likelihood Maximization
Equivalence of Maximizing Likelihood and Minimizing Loss
Equivalence of Squared Loss and Maximum Likelihood Estimation
Negative Log-Likelihood Objective for Softmax Regression
Learn After
Maximum Likelihood Estimation for Sequential Data
In training a model on a dataset (D) of sequences (\mathbf{x}), a primary goal is to find parameters that maximize the total log-probability of the observed sequences. This objective can be expressed in two equivalent ways:
Form 1:
Form 2:
What fundamental principle of probability justifies the mathematical equivalence between Form 1 and Form 2?
Verifying Log-Probability Equivalence
Analysis of a Language Model Training Objective
The mathematical equivalence between maximizing the log-probability of an entire sequence,
log Pr(x), and maximizing the sum of its conditional log-probabilities,Σ log Pr(x_i | x_<i), is established because the logarithm function transforms a sum of probabilities into a product of log-probabilities.