1Cademy - A language model is being trained on a preference dataset. For a single input prompt, the ground-truth ranked sequence of responses is `Y`. The model calculates the probability of observing this exact sequence as `Pr(Y|x) = 0.25`. Based on the formula for the objective function that maximizes the likelihood of the model predicting the correct rankings, what is the loss value for this single data point?

Learn Before

Plackett-Luce Loss Formula

Multiple Choice

A language model is being trained on a preference dataset. For a single input prompt, the ground-truth ranked sequence of responses is Y. The model calculates the probability of observing this exact sequence as Pr(Y|x) = 0.25. Based on the formula for the objective function that maximizes the likelihood of the model predicting the correct rankings, what is the loss value for this single data point?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related