1Cademy - A reward model is being trained to learn human preferences by minimizing a ranking loss function. This function penalizes the model when the score it assigns to a human-preferred response is not higher than the score for a less-preferred response. Given the same prompt, which of the following scoring outcomes for a preferred/less-preferred pair would incur a penalty from the loss function?

Model A: score(x, y_preferred) = 3.2 , score(x, y_rejected) = 1.5
Model B: score(x, y_preferred) = -0.5 , score(x, y_rejected) = -2.0

Learn Before

Reward Model Training as a Ranking Problem in RLHF

Multiple Choice

A reward model is being trained to learn human preferences by minimizing a ranking loss function. This function penalizes the model when the score it assigns to a human-preferred response is not higher than the score for a less-preferred response. Given the same prompt, which of the following scoring outcomes for a preferred/less-preferred pair would incur a penalty from the loss function?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related