1Cademy - Hinge Loss Formula for Segment-Based Reward Model Training

Learn Before

Hinge Loss for Binary Classification in Reward Model Training

Formula

Hinge Loss Formula for Segment-Based Reward Model Training

The hinge loss is a max-margin loss function used for training binary classification models. In the context of segment-based reward modeling, it is formulated as:

$\mathcal{L}_{\mathrm{hinge}} = \max(0, 1 - r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k) \cdot \hat{r})$

In this formula, $r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k)$ represents the score assigned by the reward model to the segment $\bar{\mathbf{y}}_k$ . The term $\hat{r}$ is the ground-truth label for the segment, typically encoded as $+1$ for one class (e.g., 'ethical') and $-1$ for the other (e.g., 'unethical'). The loss is zero if the model's prediction has the correct sign and a margin of at least ${}1$ ; otherwise, the loss is proportional to the distance from the margin.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

References

Learn Before

Related

Learn After