A model is being trained by maximizing the sum of log-probabilities for a dataset of 1,000 examples. Consider two scenarios for a single training update:
Scenario A: The probability assigned to the correct output for one example improves from 0.1 to 0.2. The probabilities for all other 999 examples remain unchanged.
Scenario B: The probability assigned to the correct output for one example improves from 0.8 to 0.9. The probabilities for all other 999 examples remain unchanged.
Which scenario leads to a larger increase in the overall training objective function, and why?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Language Model as a Stochastic Policy
Plackett-Luce Loss Function
A model is being trained by maximizing the sum of log-probabilities for a dataset of 1,000 examples. Consider two scenarios for a single training update:
Scenario A: The probability assigned to the correct output for one example improves from 0.1 to 0.2. The probabilities for all other 999 examples remain unchanged.
Scenario B: The probability assigned to the correct output for one example improves from 0.8 to 0.9. The probabilities for all other 999 examples remain unchanged.
Which scenario leads to a larger increase in the overall training objective function, and why?
Model Comparison using Conditional Log-Likelihood
Evaluating a Training Update