Based on the provided data and the objective of minimizing the negative expected log-probability of the ground-truth sequences, which model is performing better on this dataset? Justify your answer.

Google

Given the log-probability $$\log \Pr(\mathring{Y} | \mathbf{x})$$ of a ground-truth ordered list $$\mathring{Y}$$ conditioned on an input $$\mathbf{x}$$, the loss function based on the Plackett-Luce model is defined as the expected negative log-probability over the preference dataset $$\mathcal{D}_r$$. The formula is:
$$ \mathcal{L}_{\mathrm{pl}} = -\mathbb{E}_{(\mathbf{x},\mathring{Y}) \sim \mathcal{D}_r} \big[ \log \Pr(\mathring{Y} | \mathbf{x}) \big] $$

Plackett-Luce Loss Formula

A team is training a model to rank a list of 10 possible text completions. The training objective is to minimize a loss function defined as the negative log-probability of the ground-truth ranked sequence. The team observes that when the model incorrectly swaps the top two completions (placing the best at rank 2 and the second-best at rank 1), the penalty is much larger than when it incorrectly swaps the bottom two completions (placing the 9th-best at rank 10 and the 10th-best at rank 9). Analyze the mathematical structure of this loss function to explain this observation.

Analysis of Ranking Error Penalties

A language model is being trained on a preference dataset. For a single input prompt, the ground-truth ranked sequence of responses is `Y`. The model calculates the probability of observing this exact sequence as `Pr(Y|x) = 0.25`. Based on the formula for the objective function that maximizes the likelihood of the model predicting the correct rankings, what is the loss value for this single data point?

Learn Before

Related