Invariance of Preference Probability
A system models the probability that a response y_a is preferred over y_b for a given input x using the formula: P(y_a > y_b | x) = Sigmoid(r(x, y_a) - r(x, y_b)), where r(x, y) is a learned quality score. If a new reward function r'(x, y) is created by adding the same constant value C to all original scores (i.e., r'(x, y) = r(x, y) + C), how does this change affect the calculated preference probabilities? Explain your reasoning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Listwise Loss Formula from Accumulated Pairwise Comparisons
Empirical Reward Model Loss Formula
Empirical Formulation of Pair-wise Ranking Loss
A system learns a function,
r(input, response), that assigns a numerical score indicating the quality of aresponsefor a giveninput. The probability that responseY_ais preferred over responseY_bis then calculated using the formula:Probability = Sigmoid(r(input, Y_a) - r(input, Y_b)), whereSigmoid(z) = 1 / (1 + e^-z). Given the following scenarios for a single input, which one presents a logical inconsistency between the assigned scores and the resulting preference probability?Preference Probability Calculation
Invariance of Preference Probability