Case Study

Handling Labeler Disagreement in Reward Modeling

Based on the empirical formulation of the pair-wise ranking loss, which incorporates preference probabilities, explain how the 70/30 split in labeler preference for this specific data point influences the loss calculation and the subsequent update to the model's parameters. Contrast this with a scenario where all 10 labelers agreed that y_A was the preferred response.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science