Learn Before
A team is developing a language model to generate compelling short story endings. To gather human feedback, they generate four different endings for each story prompt. They are considering two feedback collection strategies:
Strategy 1: Human annotators are shown all four endings at once and asked to order them from best to worst.
Strategy 2: Human annotators are shown each of the four endings one at a time and asked to rate its quality on a scale of 1 to 10.
Based on the goal of collecting the most reliable data for model improvement, which strategy is generally more effective and why?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of a Human Preference Ranking in RLHF
Listwise Loss from Accumulated Pairwise Comparisons
Plackett-Luce Model for Listwise Ranking
Example of Listwise Ranking in RLHF
A team is developing a language model to generate compelling short story endings. To gather human feedback, they generate four different endings for each story prompt. They are considering two feedback collection strategies:
Strategy 1: Human annotators are shown all four endings at once and asked to order them from best to worst.
Strategy 2: Human annotators are shown each of the four endings one at a time and asked to rate its quality on a scale of 1 to 10.
Based on the goal of collecting the most reliable data for model improvement, which strategy is generally more effective and why?
Improving Feedback Collection for a Chatbot
When using a listwise ranking approach to collect human feedback for a language model, the primary task for an annotator is to assign an independent numerical quality score (e.g., 1 to 10) to each of the model's generated outputs.