1Cademy - A system learns a function, `r(input, response)`, that assigns a numerical score indicating the quality of a `response` for a given `input`. The probability that response `Y_a` is preferred over response `Y_b` is then calculated using the formula: `Probability = Sigmoid(r(input, Y_a) - r(input, Y_b))`, where `Sigmoid(z) = 1 / (1 + e^-z)`. Given the following scenarios for a single input, which one presents a logical inconsistency between the assigned scores and the resulting preference probability?

Learn Before

Modeling Pairwise Preference Probability with a Reward Function

Multiple Choice

A system learns a function, r(input, response), that assigns a numerical score indicating the quality of a response for a given input. The probability that response Y_a is preferred over response Y_b is then calculated using the formula: Probability = Sigmoid(r(input, Y_a) - r(input, Y_b)), where Sigmoid(z) = 1 / (1 + e^-z). Given the following scenarios for a single input, which one presents a logical inconsistency between the assigned scores and the resulting preference probability?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related