Formula

Pointwise Loss Function for Reward Model Training

Training a pointwise reward model involves minimizing a loss function that measures the discrepancy between the model's predicted reward, r(x,y)r(\mathbf{x}, \mathbf{y}), and the actual score provided by human annotators, ϕ(x,y)\phi(\mathbf{x}, \mathbf{y}). This process is framed as a regression task. The loss function is typically based on mean squared error (MSE) or other regression losses. For instance, a loss function using MSE would be formulated as: Lpoint=E[(ϕ(x,y)r(x,y))2]\mathcal{L}_{\text{point}} = \mathbb{E}[(\phi(\mathbf{x}, \mathbf{y}) - r(\mathbf{x}, \mathbf{y}))^2] By minimizing this loss, the model learns to produce rewards that closely match the absolute scores assigned by humans.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences