Empirical Formulation of Pair-wise Ranking Loss
The pair-wise ranking loss function for a reward model with parameters can be formulated by summing over samples from the preference dataset . The expected loss incorporates the probability of drawing an input, and the conditional probability of drawing the preferred output pair. Assuming a uniform distribution over model inputs involved in sampling, the formula simplifies to:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Related
Empirical Formulation of Pair-wise Ranking Loss
Empirical Pair-wise Ranking Loss for RLHF Reward Model
Regularized Pairwise Loss Function for Reward Model Training
A reward model is being trained to prefer one machine-generated text response over another for a given input. The training process aims to minimize a loss function calculated as the negative logarithm of a sigmoid applied to the difference between the reward scores of the preferred () and non-preferred () responses. Given the following reward scores assigned by the model to a single pair of responses, which scenario contributes the least to the total loss, indicating the model is correctly differentiating between the responses?
Diagnosing Reward Model Training Issues
Analyzing Reward Model Performance via Loss Function
Listwise Loss Formula from Accumulated Pairwise Comparisons
Empirical Reward Model Loss Formula
Empirical Formulation of Pair-wise Ranking Loss
A system learns a function,
r(input, response), that assigns a numerical score indicating the quality of aresponsefor a giveninput. The probability that responseY_ais preferred over responseY_bis then calculated using the formula:Probability = Sigmoid(r(input, Y_a) - r(input, Y_b)), whereSigmoid(z) = 1 / (1 + e^-z). Given the following scenarios for a single input, which one presents a logical inconsistency between the assigned scores and the resulting preference probability?Preference Probability Calculation
Invariance of Preference Probability
Empirical Formulation of Pair-wise Ranking Loss
Learn After
A reward model is being trained using a pair-wise ranking loss function. For a given prompt
x, the preference dataset contains a pair of responses: a preferred responsey_prefand a rejected responsey_rej. Initially, the model assigns the following scores:R(x, y_pref) = 2.0andR(x, y_rej) = 3.0. Based on the objective of the loss function, what is the most likely change to these scores after a single optimization step on this data point?Analysis of a Weighted Ranking Loss
Handling Labeler Disagreement in Reward Modeling