1Cademy - Negative Mean Squared Error Objective for Pointwise Reward Models

Learn Before

Pointwise Method (Rating) for Human Feedback in RLHF

Formula

Negative Mean Squared Error Objective for Pointwise Reward Models

The objective function for a pointwise reward model can be formulated using the negative mean squared error between human-provided scores and the model's predictions. The formula is: $\mathcal{L}_{\text{point}} = -\mathbb{E}[\varphi(\mathbf{x}, \mathbf{y}) - r(\mathbf{x}, \mathbf{y})]^2$ Here, $\mathcal{L}_{\text{point}}$ represents the objective, $\mathbb{E}$ is the expectation over the dataset, $\varphi(\mathbf{x}, \mathbf{y})$ is the score assigned by a human to response $\mathbf{y}$ for prompt $\mathbf{x}$ , and $r(\mathbf{x}, \mathbf{y})$ is the reward predicted by the model. The negative sign indicates that maximizing this objective is equivalent to minimizing the standard mean squared error.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After