Formula

Modeling Pairwise Preference Probability with a Reward Function

The probability that a response ya\mathbf{y}_a is preferred over another response yb\mathbf{y}_b given an input x\mathbf{x} is modeled using a learned reward function r(x,y)r(\mathbf{x}, \mathbf{y}). This is achieved by applying the sigmoid function to the difference between the reward scores of the two responses, as specified by the Bradley-Terry model. The formula is: Pr(yaybx)=Sigmoid(r(x,ya)r(x,yb))\text{Pr}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x}) = \text{Sigmoid}(r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b)). This is a foundational component for training reward models in RLHF.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences