Formula

Pair-wise Ranking Loss Formula for RLHF Reward Model

The pair-wise ranking loss function is used to train a reward model. The expected loss formula is expressed as:

Lossω(Dr)=E(x,yk1,yk2)Drlog(Sigmoid(Rω(x,yk1)Rω(x,yk2)))\mathrm{Loss}_{\omega}(\mathcal{D}_r) = -\mathbb{E}_{(\mathbf{x},\mathbf{y}_{k_1},\mathbf{y}_{k_2}) \sim \mathcal{D}_r} \log(\mathrm{Sigmoid}(R_{\omega}(\mathbf{x},\mathbf{y}_{k_1}) - R_{\omega}(\mathbf{x},\mathbf{y}_{k_2})))

In this equation, ω\omega represents the parameters of the reward model RωR_{\omega}, and Dr\mathcal{D}_r is a set of tuples consisting of an input and a pair of outputs. The term (x,yk1,yk2)Dr(\mathbf{x},\mathbf{y}_{k_1},\mathbf{y}_{k_2}) \sim \mathcal{D}_r signifies a sampling operation drawing a tuple from Dr\mathcal{D}_r with a specific probability. For instance, we might first draw a model input x\mathbf{x} with a uniform distribution, then draw a pair of outputs based on the conditional probability that yk1\mathbf{y}_{k_1} is preferred over yk2\mathbf{y}_{k_2} given x\mathbf{x}, denoted mathematically as Pr(yk1yk2x)\Pr(\mathbf{y}_{k_1} \succ \mathbf{y}_{k_2} | \mathbf{x}).

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related