1Cademy - When training a reward model on segment-level scores using a regression loss, the primary objective is to ensure the models predicted scores for different segments maintain the same relative order (ranking) as the target scores, even if the absolute values of the predictions are consistently different from the targets.

Learn Before

Training a Reward Model on Segment-Level Scores via Regression Loss

True/False

When training a reward model on segment-level scores using a regression loss, the primary objective is to ensure the model's predicted scores for different segments maintain the same relative order (ranking) as the target scores, even if the absolute values of the predictions are consistently different from the targets.

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related