1Cademy - A team is developing a language model to generate compelling short story endings. To gather human feedback, they generate four different endings for each story prompt. They are considering two feedback collection strategies: Strategy 1: Human annotators are shown all four endings at once and asked to order them from best to worst. Strategy 2: Human annotators are shown each of the four endings one at a time and asked to rate its quality on a scale of 1 to 10. Based on the goal of collecting the most reliable data for model improvement, which strategy is generally more effective and why?

Learn Before

Listwise Ranking for Human Feedback in RLHF

Multiple Choice

A team is developing a language model to generate compelling short story endings. To gather human feedback, they generate four different endings for each story prompt. They are considering two feedback collection strategies:

Strategy 1: Human annotators are shown all four endings at once and asked to order them from best to worst.

Strategy 2: Human annotators are shown each of the four endings one at a time and asked to rate its quality on a scale of 1 to 10.

Based on the goal of collecting the most reliable data for model improvement, which strategy is generally more effective and why?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related