Formula

Negative Mean Squared Error Objective for Pointwise Reward Models

The objective function for a pointwise reward model can be formulated using the negative mean squared error between human-provided scores and the model's predictions. The formula is: Lpoint=E[φ(x,y)r(x,y)]2\mathcal{L}_{\text{point}} = -\mathbb{E}[\varphi(\mathbf{x}, \mathbf{y}) - r(\mathbf{x}, \mathbf{y})]^2 Here, Lpoint\mathcal{L}_{\text{point}} represents the objective, E\mathbb{E} is the expectation over the dataset, φ(x,y)\varphi(\mathbf{x}, \mathbf{y}) is the score assigned by a human to response y\mathbf{y} for prompt x\mathbf{x}, and r(x,y)r(\mathbf{x}, \mathbf{y}) is the reward predicted by the model. The negative sign indicates that maximizing this objective is equivalent to minimizing the standard mean squared error.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences