Comparison

Comparison of Annotation Methods for Human Feedback in RLHF

When collecting human feedback in RLHF, there are two primary methods for evaluating model-generated outputs. One approach is to have annotators assign a direct numerical rating to each output, which frames the reward model's training as a regression problem. However, this method is challenging because establishing a consistent and universally accepted scoring standard is difficult. A more popular and simpler alternative is to have annotators rank the outputs by preference, which is a more reliable task for humans.

Image 0

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models