A machine learning engineer is training a language model on a text corpus. During training, they plot two values at each step:
- The average negative log-likelihood of the target sequences.
- The cross-entropy loss between the model's predicted probability distributions and the one-hot encoded target tokens.
The engineer observes that the two plots are identical. Which of the following statements provides the most accurate mathematical justification for this observation?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is training a language model on a text corpus. During training, they plot two values at each step:
- The average negative log-likelihood of the target sequences.
- The cross-entropy loss between the model's predicted probability distributions and the one-hot encoded target tokens.
The engineer observes that the two plots are identical. Which of the following statements provides the most accurate mathematical justification for this observation?
Equivalence of Training Objectives
True or False: The mathematical equivalence between minimizing cross-entropy loss and maximizing the auto-regressive log-likelihood for a target sequence holds true regardless of how the ground-truth labels are represented (e.g., one-hot vectors vs. smoothed probability distributions).
Comparing Language Model Training Objectives