Case Study

Calculating Pair-wise Ranking Loss

A reward model is being trained on a dataset of human preferences. For one specific data point in the dataset, the model is given a prompt (xx), a human-preferred response (yay_a), and a human-dispreferred response (yby_b). The model assigns the following scalar scores:

  • Score for preferred response, r(x,ya)=2.0r(x, y_a) = 2.0
  • Score for dispreferred response, r(x,yb)=2.0r(x, y_b) = 2.0

The loss for this single data point is calculated using the formula: L=logσ(r(x,ya)r(x,yb))\mathcal{L} = - \log \sigma(r(x,y_a) - r(x,y_b))

Where σ\sigma is the sigmoid function, σ(z)=1/(1+ez)\sigma(z) = 1 / (1 + e^{-z}), and log\log is the natural logarithm.

Calculate the loss value L\mathcal{L} for this specific data point. Explain the significance of the resulting loss value in the context of training this model. (You may use the approximation ln(0.5)0.693\ln(0.5) \approx -0.693).

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science