1Cademy - Equivalence of Maximizing Auto-regressive Log-Likelihood and Minimizing Cross-Entropy Loss

Learn Before

Auto-regressive Decomposition of Conditional Log-Likelihood

Relation

Equivalence of Maximizing Auto-regressive Log-Likelihood and Minimizing Cross-Entropy Loss

The objective of maximizing the auto-regressive log-likelihood for a sequence, which is calculated by summing the conditional log-probabilities of each token, is mathematically equivalent to the objective of minimizing the cross-entropy loss.

Updated 2025-10-10

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

A machine learning engineer is training a language model on a text corpus. During training, they plot two values at each step:
1. The average negative log-likelihood of the target sequences.
2. The cross-entropy loss between the model's predicted probability distributions and the one-hot encoded target tokens.
The engineer observes that the two plots are identical. Which of the following statements provides the most accurate mathematical justification for this observation?
Equivalence of Training Objectives
True or False: The mathematical equivalence between minimizing cross-entropy loss and maximizing the auto-regressive log-likelihood for a target sequence holds true regardless of how the ground-truth labels are represented (e.g., one-hot vectors vs. smoothed probability distributions).
Comparing Language Model Training Objectives

Learn Before

Related

Learn After