Comparing Language Model Training Objectives
An AI researcher is comparing two language models, Model A and Model B, trained on the same dataset. The training objective for Model A was to minimize the average cross-entropy loss per token, while the objective for Model B was to maximize the average auto-regressive log-likelihood per token. Based on the final reported values below, which model performs better on the training data? Justify your answer by explaining the mathematical relationship between the two training objectives.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is training a language model on a text corpus. During training, they plot two values at each step:
- The average negative log-likelihood of the target sequences.
- The cross-entropy loss between the model's predicted probability distributions and the one-hot encoded target tokens.
The engineer observes that the two plots are identical. Which of the following statements provides the most accurate mathematical justification for this observation?
Equivalence of Training Objectives
True or False: The mathematical equivalence between minimizing cross-entropy loss and maximizing the auto-regressive log-likelihood for a target sequence holds true regardless of how the ground-truth labels are represented (e.g., one-hot vectors vs. smoothed probability distributions).
Comparing Language Model Training Objectives