1Cademy - Calculating Pointwise Reward Model Loss

Learn Before

Pointwise Loss Function for Reward Model Training

Case Study

Calculating Pointwise Reward Model Loss

Using the data provided in the case study, calculate the final loss value for this batch. Assume the training process uses the mean of the squared differences between the human scores and the model's predicted rewards as its loss function. Explain the steps of your calculation.

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences