Learn Before
Listwise Loss from Accumulated Pairwise Comparisons
Modeling Pairwise Preference Probability with a Reward Function
Listwise Loss Formula from Accumulated Pairwise Comparisons
The listwise loss, derived from aggregating pairwise comparisons, is formally defined as the negative expected log-likelihood over all distinct pairs in a ranked list. The formula is:
Here:
- is the listwise loss.
- The expectation is taken over samples from the preference dataset , where is the ranked list of outputs for a prompt .
- The summation aggregates the log probability of the ground-truth preference for every ordered pair of distinct outputs within the list .
- The term serves as a normalization factor, averaging the loss over the total number of possible ordered pairs.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Listwise Loss Formula from Accumulated Pairwise Comparisons
A human annotator is given four model-generated responses (A, B, C, D) to a prompt and ranks them in order of preference from best to worst as: C > A > D > B. To train a preference model, a loss function is calculated by summing the individual losses for every pairwise comparison implied by this ranking. Which of the following sets represents all the pairwise preferences that would be used in this loss calculation?
Decomposing a Ranked List into Pairwise Preferences
Evaluating Preference Model Performance with Listwise Loss
Listwise Loss Formula from Accumulated Pairwise Comparisons
Empirical Reward Model Loss Formula
Empirical Formulation of Pair-wise Ranking Loss
A system learns a function,
r(input, response)
, that assigns a numerical score indicating the quality of aresponse
for a giveninput
. The probability that responseY_a
is preferred over responseY_b
is then calculated using the formula:Probability = Sigmoid(r(input, Y_a) - r(input, Y_b))
, whereSigmoid(z) = 1 / (1 + e^-z)
. Given the following scenarios for a single input, which one presents a logical inconsistency between the assigned scores and the resulting preference probability?Preference Probability Calculation
Invariance of Preference Probability
Learn After
Consider the following formula for a loss function used to train a model on ranked lists of outputs, where
N
is the number of items in a given listY
:What is the primary analytical consequence of including the normalization term in this calculation?
Applying the Listwise Loss Summation
Consider the listwise loss formula used for training on ranked preferences:
True or False: If a model is completely uncertain about the preferences within a ranked list (i.e., it assigns for all distinct pairs), the contribution of that specific list to the overall loss will be zero.