1Cademy - Empirical Formulation of Pair-wise Ranking Loss

Learn Before

Formula

Empirical Formulation of Pair-wise Ranking Loss

The pair-wise ranking loss function for a reward model with parameters $\omega$ can be formulated by summing over samples from the preference dataset $\mathcal{D}_r$ . The expected loss incorporates the probability $\Pr(\mathbf{x})$ of drawing an input, and the conditional probability $\Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x})$ of drawing the preferred output pair. Assuming a uniform distribution over $K$ model inputs involved in sampling, the formula simplifies to:

$\begin{aligned} \mathrm{Loss}_{\omega}(\mathcal{D}_r) &= -\sum \Pr(\mathbf{x}) \cdot \Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}) \cdot \log(\mathrm{Sigmoid}(R_{\omega}(\mathbf{x}, \mathbf{y}_{k_1}) - R_{\omega}(\mathbf{x}, \mathbf{y}_{k_2}))) \\ &= -\frac{1}{K} \sum \Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}) \cdot \log(\mathrm{Sigmoid}(R_{\omega}(\mathbf{x}, \mathbf{y}_{k_1}) - R_{\omega}(\mathbf{x}, \mathbf{y}_{k_2}))) \end{aligned}$

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After