Formula

Pointwise Rating Loss (L_rating) Formula

The pointwise rating loss, denoted as LratingL_{\text{rating}}, is an objective function used to train a reward model by aligning its predictions with a target score. It is formulated as the negative mean squared error between a target score, s(yˉk)s(\bar{\mathbf{y}}_k), and the model's predicted reward, r(x,y,yˉk)r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k). The formula is: Lrating=Eyˉk[s(yˉk)r(x,y,yˉk)]2L_{\text{rating}} = -\mathbb{E}_{\bar{\mathbf{y}}_k} [s(\bar{\mathbf{y}}_k) - r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k)]^2 Maximizing this objective function minimizes the squared difference between the target score and the model's reward. The expectation, Eyˉk\mathbb{E}_{\bar{\mathbf{y}}_k}, is calculated over the distribution of average vectors yˉk\bar{\mathbf{y}}_k.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences