Comparison

Comparison of Pointwise vs. Relative Preference Methods in RLHF

The main difference between pointwise and relative preference methods lies in their training objective. Pointwise methods aim to predict absolute scores, which can be a disadvantage when human-provided scores are inconsistent. In contrast, relative preference methods learn from comparative judgments between different outputs. This focus on relative differences is beneficial as it encourages the model to learn more generalizable patterns of what constitutes a successful or unsuccessful response.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences