1Cademy - Calculating Pair-wise Ranking Loss

Learn Before

Empirical Pair-wise Ranking Loss for RLHF Reward Model

Case Study

Calculating Pair-wise Ranking Loss

A reward model is being trained on a dataset of human preferences. For one specific data point in the dataset, the model is given a prompt ( $x$ ), a human-preferred response ( $y_a$ ), and a human-dispreferred response ( $y_b$ ). The model assigns the following scalar scores:

Score for preferred response, $r(x, y_a) = 2.0$
Score for dispreferred response, $r(x, y_b) = 2.0$

The loss for this single data point is calculated using the formula: $\mathcal{L} = - \log \sigma(r(x,y_a) - r(x,y_b))$

Where $\sigma$ is the sigmoid function, $\sigma(z) = 1 / (1 + e^{-z})$ , and $\log$ is the natural logarithm.

Calculate the loss value $\mathcal{L}$ for this specific data point. Explain the significance of the resulting loss value in the context of training this model. (You may use the approximation $\ln(0.5) \approx -0.693$ ).

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related