Formula

Bradley-Terry Model for Pairwise Preference Probability

The probability that one output, ya\mathbf{y}_a, is preferred over another, yb\mathbf{y}_b, given an input x\mathbf{x}, can be defined using the Bradley-Terry model. This approach uses a reward function, r(x,y)r(\mathbf{x}, \mathbf{y}), to assign a score to each output. The preference probability is calculated as the sigmoid of the difference between their scores: Pr(yaybx)=Sigmoid(r(x,ya)r(x,yb))\Pr(\mathbf{y}_a \succ \mathbf{y}_b | \mathbf{x}) = \mathrm{Sigmoid}(r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b)) This model maps the reward difference to a valid probability.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences