Formula

Regularized Pairwise Loss Function for Reward Model Training

To prevent the reward scores from becoming excessively large during training, a regularization term can be added to the standard pairwise loss function. This regularized loss, LregL_{\text{reg}}, combines the pairwise loss (LpairL_{\text{pair}}) with a term that penalizes the squared sum of the rewards for a given pair. The complete formula is:

Lreg=E(x,ya,yb)Dr[logPrϕ(yaybx)]E(x,ya,yb)Dr[(r(x,ya)+r(x,yb))2]L_{\text{reg}} = -\mathbb{E}_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\sim\mathcal{D}_r} [\log \text{Pr}_{\phi}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x})] -\mathbb{E}_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b)\sim\mathcal{D}_r} [ (r(\mathbf{x}, \mathbf{y}_a) + r(\mathbf{x}, \mathbf{y}_b))^2 ]

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Learn After