Case Study

Comparing Language Model Training Objectives

An AI researcher is comparing two language models, Model A and Model B, trained on the same dataset. The training objective for Model A was to minimize the average cross-entropy loss per token, while the objective for Model B was to maximize the average auto-regressive log-likelihood per token. Based on the final reported values below, which model performs better on the training data? Justify your answer by explaining the mathematical relationship between the two training objectives.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science