A research lab trains two different preference models, Model A and Model B, on the exact same dataset of human choices. When evaluating a specific input, they find that for a pair of outputs (Y_1, Y_2), Model A calculates the probability that Y_1 is preferred over Y_2 as 0.8. However, Model B calculates this same probability as 0.6. Both labs report their finding using the notation Pr(Y_1 ≻ Y_2 | input). What is the most accurate explanation for this discrepancy?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Interpreting Preference Model Notation
A research lab trains two different preference models, Model A and Model B, on the exact same dataset of human choices. When evaluating a specific input, they find that for a pair of outputs (Y_1, Y_2), Model A calculates the probability that Y_1 is preferred over Y_2 as 0.8. However, Model B calculates this same probability as 0.6. Both labs report their finding using the notation
Pr(Y_1 ≻ Y_2 | input). What is the most accurate explanation for this discrepancy?Evaluating Notational Simplification in Preference Models