1Cademy - A machine learning engineer is training a reward model. The goal is for the models output, `r`, to be as close as possible to a set of human-provided target scores, `s`. The engineer chooses the following objective function to maximize for each data point: `L = - (s - r)^2`. Why is maximizing this objective function an effective strategy for achieving the engineers goal?

Learn Before

Pointwise Rating Loss (L_rating) Formula

Multiple Choice

A machine learning engineer is training a reward model. The goal is for the model's output, r, to be as close as possible to a set of human-provided target scores, s. The engineer chooses the following objective function to maximize for each data point: L = - (s - r)^2. Why is maximizing this objective function an effective strategy for achieving the engineer's goal?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related