Interpreting Preference Model Notation
Based on the scenario, a novice might incorrectly conclude that the two teams' models are functionally identical because they produce the same output probability for this specific input. Identify the key element missing from the simplified Pr(·) notation that creates this ambiguity, and explain why its inclusion is necessary to distinguish between the predictions of two different underlying models.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Interpreting Preference Model Notation
A research lab trains two different preference models, Model A and Model B, on the exact same dataset of human choices. When evaluating a specific input, they find that for a pair of outputs (Y_1, Y_2), Model A calculates the probability that Y_1 is preferred over Y_2 as 0.8. However, Model B calculates this same probability as 0.6. Both labs report their finding using the notation
Pr(Y_1 ≻ Y_2 | input). What is the most accurate explanation for this discrepancy?Evaluating Notational Simplification in Preference Models