1Cademy - True or False: When training a reward model using the loss function `L = E[(human_score - predicted_reward)^2]`, the primary objective is to ensure that for any two outputs, the one with the higher human score also receives a higher predicted reward from the model.

Learn Before

Pointwise Loss Function for Reward Model Training

True/False

True or False: When training a reward model using the loss function L = E[(human_score - predicted_reward)^2], the primary objective is to ensure that for any two outputs, the one with the higher human score also receives a higher predicted reward from the model.

Updated 2025-10-08

Contributors are: