1Cademy - Pair-wise Ranking Loss Formula for RLHF Reward Model

Learn Before

Formula

Pair-wise Ranking Loss Formula for RLHF Reward Model

The pair-wise ranking loss function is used to train a reward model. The expected loss formula is expressed as:

$\mathrm{Loss}_{\omega}(\mathcal{D}_r) = -\mathbb{E}_{(\mathbf{x},\mathbf{y}_{k_1},\mathbf{y}_{k_2}) \sim \mathcal{D}_r} \log(\mathrm{Sigmoid}(R_{\omega}(\mathbf{x},\mathbf{y}_{k_1}) - R_{\omega}(\mathbf{x},\mathbf{y}_{k_2})))$

In this equation, $\omega$ represents the parameters of the reward model $R_{\omega}$ , and $\mathcal{D}_r$ is a set of tuples consisting of an input and a pair of outputs. The term $(\mathbf{x},\mathbf{y}_{k_1},\mathbf{y}_{k_2}) \sim \mathcal{D}_r$ signifies a sampling operation drawing a tuple from $\mathcal{D}_r$ with a specific probability. For instance, we might first draw a model input $\mathbf{x}$ with a uniform distribution, then draw a pair of outputs based on the conditional probability that $\mathbf{y}_{k_1}$ is preferred over $\mathbf{y}_{k_2}$ given $\mathbf{x}$ , denoted mathematically as $\Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x})$ .

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After