Short Answer

Trade-offs in Human Feedback Collection Methods

A team is developing a system to improve a language model's conversational abilities using human feedback. They are debating between two methods for data collection:

  1. Method A: Annotators rate each model-generated response on a scale of 1 to 7.
  2. Method B: Annotators are shown two responses to the same prompt and must choose which one is better.

Analyze the primary challenge the team would face in ensuring data quality with Method A, and explain why Method B is often considered a more reliable alternative for this type of task.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science