Case Study

Impact of Inconsistent Labels on Reward Model Training

A team is training a model to classify text segments as either 'appropriate' or 'inappropriate'. The training process aims to minimize a classification loss function, which measures the difference between the model's predictions and the ground-truth labels provided by human annotators. If the human-provided labels are highly inconsistent (e.g., very similar segments are often given opposite labels), analyze the specific impact this would have on the role of the loss function and the overall model training.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science