1Cademy - Bradley-Terry Model for Preference Probability

Formula

Bradley-Terry Model for Preference Probability

The probability that a response $\mathbf{y}_a$ is preferred over another response $\mathbf{y}_b$ (denoted $\mathbf{y}_a \succ \mathbf{y}_b$ ), given an input $\mathbf{x}$ , can be modeled using a formulation based on the Bradley-Terry model. This model defines the probability as a sigmoid function of the difference between their respective reward scores, $r(\mathbf{x}, \mathbf{y}_a)$ and $r(\mathbf{x}, \mathbf{y}_b)$ . The formula is: $\text{Pr}_{\theta}(\mathbf{y}_a \succ \mathbf{y}_b|\mathbf{x}) = \text{Sigmoid}(r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b))$ This maps the reward difference, which can be any real number, to a valid probability between 0 and 1.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A preference model calculates the probability that response y_a is preferred over response y_b for a given input x using the formula: Pr(y_a > y_b | x) = Sigmoid(r(x, y_a) - r(x, y_b)), where r(x, y) is a real-valued score for a given response. Based on this model, which of the following statements accurately describes its behavior?
A preference model calculates the probability that a 'winning' response, y_w, is preferred over a 'losing' response, y_l, for a given input x. The model uses the formula: Pr(y_w > y_l | x) = Sigmoid(r(x, y_w) - r(x, y_l)), where r(x, y) is a scalar reward score. In a specific training example, the reward scores for the two responses are found to be nearly identical, i.e., r(x, y_w) ≈ r(x, y_l). What does this imply about the calculated preference probability?
Interpreting Preference Model Output