Formula

Empirical Formulation of Pair-wise Ranking Loss

The pair-wise ranking loss function for a reward model with parameters ω\omega can be formulated by summing over samples from the preference dataset Dr\mathcal{D}_r. The expected loss incorporates the probability Pr(x)\Pr(\mathbf{x}) of drawing an input, and the conditional probability Pr(yk1yk2x)\Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}) of drawing the preferred output pair. Assuming a uniform distribution over KK model inputs involved in sampling, the formula simplifies to:

Lossω(Dr)=Pr(x)Pr(yk1yk2x)log(Sigmoid(Rω(x,yk1)Rω(x,yk2)))=1KPr(yk1yk2x)log(Sigmoid(Rω(x,yk1)Rω(x,yk2)))\begin{aligned} \mathrm{Loss}_{\omega}(\mathcal{D}_r) &= -\sum \Pr(\mathbf{x}) \cdot \Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}) \cdot \log(\mathrm{Sigmoid}(R_{\omega}(\mathbf{x}, \mathbf{y}_{k_1}) - R_{\omega}(\mathbf{x}, \mathbf{y}_{k_2}))) \\ &= -\frac{1}{K} \sum \Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}) \cdot \log(\mathrm{Sigmoid}(R_{\omega}(\mathbf{x}, \mathbf{y}_{k_1}) - R_{\omega}(\mathbf{x}, \mathbf{y}_{k_2}))) \end{aligned}

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related