1Cademy - An AI team is training a system to learn from human preferences. They have a dataset where for a given input `x`, humans consistently prefer response `y_preferred` over response `y_rejected`. After training, they test two different scoring models, Model A and Model B, on this pair. The models produce the following scores: * **Model A:** `score(x, y_preferred) = 3.2`, `score(x, y_rejected) = 1.5` * **Model B:** `score(x, y_preferred) = -0.5`, `score(x, y_rejected) = -2.0` Based on these sco

Learn Before

Reward Model Training as a Ranking Problem in RLHF

Multiple Choice

An AI team is training a system to learn from human preferences. They have a dataset where for a given input x, humans consistently prefer response y_preferred over response y_rejected. After training, they test two different scoring models, Model A and Model B, on this pair. The models produce the following scores:

Model A: score(x, y_preferred) = 3.2, score(x, y_rejected) = 1.5
Model B: score(x, y_preferred) = -0.5, score(x, y_rejected) = -2.0

Based on these sco

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related