1Cademy - Importance of Variability in Pairwise Preference Data

Learn Before

Pairwise Comparison for Human Feedback in RLHF

Concept

Importance of Variability in Pairwise Preference Data

Research indicates that having significant variability within the pairwise preference data is a key factor for successfully training Large Language Models, regardless of whether the feedback originates from humans or AI systems.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A team is training a language model using preference data from a group of 10 labelers. For each prompt, the labelers are shown two potential responses and asked to choose the better one. The team considers two data collection strategies:
- Strategy 1: The team uses a highly aligned group of labelers who almost always agree. For 95% of the prompts, at least 9 out of 10 labelers choose the same response as the 'winner'.
- Strategy 2: The team uses a more diverse group of labelers. For
Diagnosing a Model Training Plateau
Evaluating Preference Datasets

Learn Before

Related

Learn After