Calculating Pointwise Reward Model Loss
Using the data provided in the case study, calculate the final loss value for this batch. Assume the training process uses the mean of the squared differences between the human scores and the model's predicted rewards as its loss function. Explain the steps of your calculation.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Calculating Pointwise Reward Model Loss
A machine learning engineer is training a reward model where human annotators assign an absolute quality score to each generated text. The engineer considers switching the loss function from Mean Squared Error (MSE), which calculates
(human_score - predicted_reward)^2, to Mean Absolute Error (MAE), which calculates|human_score - predicted_reward|. What is the most significant consequence of this change on the reward model's learned behavior?True or False: When training a reward model using the loss function
L = E[(human_score - predicted_reward)^2], the primary objective is to ensure that for any two outputs, the one with the higher human score also receives a higher predicted reward from the model.