Learn Before
Bradley-Terry Model for Pairwise Preference Probability
The probability that one output, , is preferred over another, , given an input , can be defined using the Bradley-Terry model. This approach uses a reward function, , to assign a score to each output. The preference probability is calculated as the sigmoid of the difference between their scores: This model maps the reward difference to a valid probability.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Bradley-Terry Model for Pairwise Preference Probability
Ranking Chatbot Responses
A user provides the prompt, denoted as 'x', 'Translate the phrase "hello world" into French.' to a language model. The model generates two responses: Response A ('y_A'), which is 'Bonjour le monde', and Response B ('y_B'), which is 'Salut monde'. A human evaluator indicates that Response A is a better translation than Response B. Which of the following expressions correctly represents the probability of this specific preference, given the user's prompt?
Modeling Pairwise Preference Probability with a Reward Function
Interpreting Preference Probability Notation
Learn After
Pair-wise Ranking Loss Formula for RLHF Reward Model
Simplified Notation for Preference Probability Models
Reward Model Loss as Negative Log-Likelihood
Empirical Reward Model Loss Formula using Bradley-Terry Model
A system for evaluating generated text uses a scalar scoring function,
r(input, output), to assign a numerical score to each potential output. For a given input, 'Output A' receives a score of 2.0, and 'Output B' receives a score of -0.2. The system models the probability that one output is preferred over another using the sigmoid of the difference between their scores. Based on this model, what is the approximate probability that 'Output A' is preferred over 'Output B'?Impact of Score Transformation on Preference Probabilities
Derivation of the Bradley-Terry Preference Formula
Omission of Parameter Superscript in Probability Notation
A preference model calculates the probability that output Y_a is preferred over output Y_b by applying the sigmoid function to the difference in their scalar scores,
score(Y_a) - score(Y_b). If the initial scores for Y_a and Y_b result in a preference probability greater than 50% but less than 100%, which of the following transformations to the scores is guaranteed to leave this probability unchanged?