1Cademy - Derivation of the Bradley-Terry Preference Formula

Learn Before

Formula

Derivation of the Bradley-Terry Preference Formula

The Bradley-Terry model can be used to express the probability of one item, $\mathbf{y}_a$ , being preferred over another, $\mathbf{y}_b$ , given a context $\mathbf{x}$ . The model starts by defining this probability as the ratio of the exponentiated reward score of the preferred item to the sum of the exponentiated scores of both items. This formulation can be algebraically simplified to the sigmoid function of the difference between the two reward scores. The derivation proceeds as follows: $\text{Pr}(\mathbf{y}_a \succ \mathbf{y}_b | \mathbf{x}) = \frac{e^{r(\mathbf{x}, \mathbf{y}_a)}}{e^{r(\mathbf{x}, \mathbf{y}_a)} + e^{r(\mathbf{x}, \mathbf{y}_b)}} = \frac{e^{r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b)}}{e^{r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b)} + 1} = \text{Sigmoid}(r(\mathbf{x}, \mathbf{y}_a) - r(\mathbf{x}, \mathbf{y}_b))$ This derivation shows how a model based on exponentiated scores is equivalent to modeling the preference probability using the sigmoid of the score difference.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After