Derivation of the Bradley-Terry Preference Formula
The Bradley-Terry model can be used to express the probability of one item, , being preferred over another, , given a context . The model starts by defining this probability as the ratio of the exponentiated reward score of the preferred item to the sum of the exponentiated scores of both items. This formulation can be algebraically simplified to the sigmoid function of the difference between the two reward scores. The derivation proceeds as follows: This derivation shows how a model based on exponentiated scores is equivalent to modeling the preference probability using the sigmoid of the score difference.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A team is training a language model using human feedback. For a given prompt, the model generates two distinct responses, Response A and Response B. A human evaluator indicates a preference for Response A over Response B. To learn from this feedback, the system uses a probabilistic model designed for pairwise comparisons to quantify this preference. Which statement best analyzes how this model represents the human's choice?
Interpreting Preference Data for AI Training
Justifying the Choice of a Preference Model
Derivation of the Bradley-Terry Preference Formula
Pair-wise Ranking Loss Formula for RLHF Reward Model
Simplified Notation for Preference Probability Models
Reward Model Loss as Negative Log-Likelihood
Empirical Reward Model Loss Formula using Bradley-Terry Model
A system for evaluating generated text uses a scalar scoring function,
r(input, output), to assign a numerical score to each potential output. For a given input, 'Output A' receives a score of 2.0, and 'Output B' receives a score of -0.2. The system models the probability that one output is preferred over another using the sigmoid of the difference between their scores. Based on this model, what is the approximate probability that 'Output A' is preferred over 'Output B'?Impact of Score Transformation on Preference Probabilities
Derivation of the Bradley-Terry Preference Formula
Omission of Parameter Superscript in Probability Notation
A preference model calculates the probability that output Y_a is preferred over output Y_b by applying the sigmoid function to the difference in their scalar scores,
score(Y_a) - score(Y_b). If the initial scores for Y_a and Y_b result in a preference probability greater than 50% but less than 100%, which of the following transformations to the scores is guaranteed to leave this probability unchanged?
Learn After
A system models human preference between two generated responses, A and B, for a given prompt. It does this by first assigning a numerical reward score to each response, r(A) and r(B). The probability that response A is preferred over B is then calculated as Sigmoid(r(A) - r(B)). Based on this model, what happens to the predicted probability of preferring response A as the difference r(A) - r(B) becomes a very large positive number?
Interpreting Reward Model Scores
A preference model calculates the probability of response 'a' being preferred over response 'b' using their respective reward scores, r(a) and r(b). The initial formula is given as: P(a > b) = exp(r(a)) / (exp(r(a)) + exp(r(b))). Arrange the following algebraic steps in the correct order to simplify this expression into the form Sigmoid(r(a) - r(b)).