Concept

Annotation Simplicity in RLHF: Recognition over Demonstration

The preference-based annotation in RLHF is simpler than the demonstration required for supervised learning because human values are often hard to articulate and even more challenging to exemplify perfectly. Instead of asking annotators to create ideal responses, RLHF presents them with a list of model-generated outputs and asks them to rank their preferences. This method is highly effective in situations where desired behaviors are difficult to demonstrate but can be easily recognized, shifting the annotation task from content creation to evaluation.

0

1

Updated 2026-04-20

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences