1Cademy - Bradley-Terry Model for Pairwise Preference Probability

Learn Before

Conditional Probability of Pairwise Preference

Formula

Bradley-Terry Model for Pairwise Preference Probability

The probability that one output, $\mathbf{y}_a$ , is preferred over another, $\mathbf{y}_b$ , given an input $\mathbf{x}$ , can be defined using the Bradley-Terry model. This approach uses a reward function, $r(\mathbf{x}, \mathbf{y})$ , to assign a score to each output. The preference probability is calculated as the sigmoid of the difference between their scores: $\Pr(\mathbf{y}_a \succ \mathbf{y}_b | \mathbf{x}) = \mathrm{Sigmoid}(r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b))$ This model maps the reward difference to a valid probability.