Definition

Pairwise Comparison for Human Feedback in RLHF

Pairwise comparison, or pairwise ranking, is a fundamental method for gathering human feedback in Reinforcement Learning from Human Feedback (RLHF). Given an input prompt x\mathbf{x}, two candidate outputs, ya\mathbf{y}_a and yb\mathbf{y}_b, are randomly drawn. A human expert selects the preferred response based on specific criteria such as clarity, relevance, and accuracy. This preference is formally encoded as a binary label: yayb\mathbf{y}_a \succ \mathbf{y}_b if ya\mathbf{y}_a is preferred, or ybya\mathbf{y}_b \succ \mathbf{y}_a if yb\mathbf{y}_b is preferred.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences