1Cademy - Pairwise Comparison for Human Feedback in RLHF

Learn Before

Comparison of Annotation Methods for Human Feedback in RLHF

Definition

Pairwise Comparison for Human Feedback in RLHF

Pairwise comparison, or pairwise ranking, is a fundamental method for gathering human feedback in Reinforcement Learning from Human Feedback (RLHF). Given an input prompt $\mathbf{x}$ , two candidate outputs, $\mathbf{y}_a$ and $\mathbf{y}_b$ , are randomly drawn. A human expert selects the preferred response based on specific criteria such as clarity, relevance, and accuracy. This preference is formally encoded as a binary label: $\mathbf{y}_a \succ \mathbf{y}_b$ if $\mathbf{y}_a$ is preferred, or $\mathbf{y}_b \succ \mathbf{y}_a$ if $\mathbf{y}_b$ is preferred.

Updated 2026-06-30

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Evaluation Criteria for Pairwise Comparison in RLHF
Bradley-Terry Model
Reward Model Training as a Ranking Problem in RLHF
Listwise Ranking for Human Feedback in RLHF
Importance of Variability in Pairwise Preference Data
Evaluating a Feedback Collection Strategy
A development team is refining a language model's ability to generate summaries. For each source document, they have the model produce two different summaries. They then present these two summaries side-by-side to a human annotator and ask them to select the one that is of higher quality. Which statement best analyzes the primary strength of this specific approach for collecting human feedback?
Rationale for a Feedback Collection Method
Binary Encoding of Pairwise Feedback in RLHF
Preference Notation in Human Feedback

Learn Before

Related

Learn After