1Cademy - Modeling Preference Probability with the Bradley-Terry Model in RLHF

Learn Before

Bradley-Terry Model

Concept

Modeling Preference Probability with the Bradley-Terry Model in RLHF

In the context of RLHF, the Bradley-Terry model is adapted to formally express the probability that a given model output, $y_a$ , is preferred over another, $y_b$ . This application of the model provides a mathematical framework for quantifying human preferences.