A preference model calculates the probability that output Y_a is preferred over output Y_b by applying the sigmoid function to the difference in their scalar scores, score(Y_a) - score(Y_b). If the initial scores for Y_a and Y_b result in a preference probability greater than 50% but less than 100%, which of the following transformations to the scores is guaranteed to leave this probability unchanged?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Pair-wise Ranking Loss Formula for RLHF Reward Model
Simplified Notation for Preference Probability Models
Reward Model Loss as Negative Log-Likelihood
Empirical Reward Model Loss Formula using Bradley-Terry Model
A system for evaluating generated text uses a scalar scoring function,
r(input, output), to assign a numerical score to each potential output. For a given input, 'Output A' receives a score of 2.0, and 'Output B' receives a score of -0.2. The system models the probability that one output is preferred over another using the sigmoid of the difference between their scores. Based on this model, what is the approximate probability that 'Output A' is preferred over 'Output B'?Impact of Score Transformation on Preference Probabilities
Derivation of the Bradley-Terry Preference Formula
Omission of Parameter Superscript in Probability Notation
A preference model calculates the probability that output Y_a is preferred over output Y_b by applying the sigmoid function to the difference in their scalar scores,
score(Y_a) - score(Y_b). If the initial scores for Y_a and Y_b result in a preference probability greater than 50% but less than 100%, which of the following transformations to the scores is guaranteed to leave this probability unchanged?