Learn Before
Formula
Squared Sum of Rewards Regularization
To make the supervision signal for training the reward model more robust, a regularization term based on the squared sum of rewards can be added to the pairwise comparison loss in RLHF. This regularization term helps mitigate the underdetermination of reward models. The regularized loss function is formulated as: .
0
1
Updated 2026-05-02
Tags
Foundations of Large Language Models
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences