1Cademy - Comparing Language Model Training Objectives

Learn Before

Equivalence of Maximizing Auto-regressive Log-Likelihood and Minimizing Cross-Entropy Loss

Case Study

Comparing Language Model Training Objectives

An AI researcher is comparing two language models, Model A and Model B, trained on the same dataset. The training objective for Model A was to minimize the average cross-entropy loss per token, while the objective for Model B was to maximize the average auto-regressive log-likelihood per token. Based on the final reported values below, which model performs better on the training data? Justify your answer by explaining the mathematical relationship between the two training objectives.

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related