1Cademy - Regularized Pairwise Loss Function for Reward Model Training

Learn Before

Pair-wise Ranking Loss Formula for RLHF Reward Model

Formula

Regularized Pairwise Loss Function for Reward Model Training

To prevent the reward scores from becoming excessively large during training, a regularization term can be added to the standard pairwise loss function. This regularized loss, $L_{\text{reg}}$ , combines the pairwise loss ( $L_{\text{pair}}$ ) with a term that penalizes the squared sum of the rewards for a given pair. The complete formula is:

$L_{\text{reg}} = -\mathbb{E}_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\sim\mathcal{D}_r} [\log \text{Pr}_{\phi}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x})] -\mathbb{E}_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\sim\mathcal{D}_r} [ (r(\mathbf{x}, \mathbf{y}_a) + r(\mathbf{x}, \mathbf{y}_b))^2 ]$

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After