1Cademy - Analyzing Reward Model Performance via Loss Function

Learn Before

Pair-wise Ranking Loss Formula for RLHF Reward Model

Short Answer

Analyzing Reward Model Performance via Loss Function

A machine learning engineer is training a reward model using a pair-wise ranking loss function. They observe that the loss value is consistently high and not decreasing. This is happening because the model is assigning very similar reward scores to both the preferred and non-preferred responses for any given input. Explain, by referencing the components of the loss function Loss = -E[log(Sigmoid(R_preferred - R_dispreferred))], why this scenario leads to a high loss value.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Learn Before

Related