Learn Before
Analysis of Ranking Error Penalties
A team is training a model to rank a list of 10 possible text completions. The training objective is to minimize a loss function defined as the negative log-probability of the ground-truth ranked sequence. The team observes that when the model incorrectly swaps the top two completions (placing the best at rank 2 and the second-best at rank 1), the penalty is much larger than when it incorrectly swaps the bottom two completions (placing the 9th-best at rank 10 and the 10th-best at rank 9). Analyze the mathematical structure of this loss function to explain this observation.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of Ranking Error Penalties
A language model is being trained on a preference dataset. For a single input prompt, the ground-truth ranked sequence of responses is
Y. The model calculates the probability of observing this exact sequence asPr(Y|x) = 0.25. Based on the formula for the objective function that maximizes the likelihood of the model predicting the correct rankings, what is the loss value for this single data point?Model Performance Evaluation using Plackett-Luce Loss