Learn Before
Handling Labeler Disagreement in Reward Modeling
Based on the empirical formulation of the pair-wise ranking loss, which incorporates preference probabilities, explain how the 70/30 split in labeler preference for this specific data point influences the loss calculation and the subsequent update to the model's parameters. Contrast this with a scenario where all 10 labelers agreed that y_A was the preferred response.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A reward model is being trained using a pair-wise ranking loss function. For a given prompt
x, the preference dataset contains a pair of responses: a preferred responsey_prefand a rejected responsey_rej. Initially, the model assigns the following scores:R(x, y_pref) = 2.0andR(x, y_rej) = 3.0. Based on the objective of the loss function, what is the most likely change to these scores after a single optimization step on this data point?Analysis of a Weighted Ranking Loss
Handling Labeler Disagreement in Reward Modeling