Bradley-Terry Model for Preference Probability
The probability that a response is preferred over another response (denoted ), given an input , can be modeled using a formulation based on the Bradley-Terry model. This model defines the probability as a sigmoid function of the difference between their respective reward scores, and . The formula is: This maps the reward difference, which can be any real number, to a valid probability between 0 and 1.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Learn After
A preference model calculates the probability that response
y_ais preferred over responsey_bfor a given inputxusing the formula:Pr(y_a > y_b | x) = Sigmoid(r(x, y_a) - r(x, y_b)), wherer(x, y)is a real-valued score for a given response. Based on this model, which of the following statements accurately describes its behavior?A preference model calculates the probability that a 'winning' response,
y_w, is preferred over a 'losing' response,y_l, for a given inputx. The model uses the formula:Pr(y_w > y_l | x) = Sigmoid(r(x, y_w) - r(x, y_l)), wherer(x, y)is a scalar reward score. In a specific training example, the reward scores for the two responses are found to be nearly identical, i.e.,r(x, y_w) ≈ r(x, y_l). What does this imply about the calculated preference probability?Interpreting Preference Model Output