1Cademy - Preference Probability Calculation

Learn Before

Modeling Pairwise Preference Probability with a Reward Function

Case Study

Preference Probability Calculation

An AI development team is refining a language model using a reward function, r(prompt, response), which assigns a quality score to a given response. The probability that a preferred response (y_a) is chosen over another response (y_b) is determined by the formula: Pr(y_a ≻ y_b | prompt) = Sigmoid(r(prompt, y_a) - r(prompt, y_b)), where Sigmoid(z) = 1 / (1 + e^-z). Given the scenario below, calculate the probability that Response A is preferred over Response B. Show your calculation and provide the final probability rounded to two decimal places.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related