A preference model calculates the probability of response 'a' being preferred over response 'b' using their respective reward scores, r(a) and r(b). The initial formula is given as: P(a > b) = exp(r(a)) / (exp(r(a)) + exp(r(b))). Arrange the following algebraic steps in the correct order to simplify this expression into the form Sigmoid(r(a) - r(b)).
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A system models human preference between two generated responses, A and B, for a given prompt. It does this by first assigning a numerical reward score to each response, r(A) and r(B). The probability that response A is preferred over B is then calculated as Sigmoid(r(A) - r(B)). Based on this model, what happens to the predicted probability of preferring response A as the difference r(A) - r(B) becomes a very large positive number?
Interpreting Reward Model Scores
A preference model calculates the probability of response 'a' being preferred over response 'b' using their respective reward scores, r(a) and r(b). The initial formula is given as: P(a > b) = exp(r(a)) / (exp(r(a)) + exp(r(b))). Arrange the following algebraic steps in the correct order to simplify this expression into the form Sigmoid(r(a) - r(b)).