1Cademy - Empirical Reward Model Loss Formula using Bradley-Terry Model

Learn Before

Formula

Empirical Reward Model Loss Formula using Bradley-Terry Model

The reward model is trained by minimizing an empirical loss function derived from the Bradley-Terry model for pairwise comparisons. The objective is to adjust the model's parameters, $\phi$ , to minimize the negative log-likelihood of the observed human preferences in the dataset $\mathcal{D}_r$ . This is achieved by applying the sigmoid function to the difference in reward scores for the preferred response, $\mathbf{y}_a$ , and the rejected response, $\mathbf{y}_b$ , and then minimizing the negative logarithm of this probability, averaged over the entire dataset. The formula is:

$\min_{\phi} - \frac{1}{|\mathcal{D}_r|} \sum_{(\mathbf{x},\mathbf{y}_a,\mathbf{y}_b) \in \mathcal{D}_r} \log \sigma(r_\phi(\mathbf{x},\mathbf{y}_a) - r_\phi(\mathbf{x},\mathbf{y}_b))$

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After