1Cademy - Interpreting Preference Model Output

Learn Before

Bradley-Terry Model for Preference Probability

Short Answer

Interpreting Preference Model Output

A preference model calculates the probability that response y_a is preferred over response y_b for a given input x using the formula: Pr(y_a > y_b | x) = Sigmoid(r(x, y_a) - r(x, y_b)), where r(x, y) is a scalar reward score. If the model outputs a probability of 0.95 for Pr(y_a > y_b | x), what can you conclude about the relative values of the reward scores r(x, y_a) and r(x, y_b)? Explain your reasoning.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related