Multiple Choice

An AI team is preparing a dataset for training a reward model. They present pairs of model-generated text, (Response 1, Response 2), to human labelers. The team's convention is to encode the preference as '1' if Response 1 is chosen, and '0' if Response 2 is chosen. Given the following three labeling results, what is the correct sequence of binary labels that should be recorded for the dataset?

  • Session 1: The pair (Text A, Text B) was shown, and the labeler chose Text B.
  • Session 2: The pair (Text C, Text D) was shown, and the labeler chose Text C.
  • Session 3: The pair (Text E, Text F) was shown, and the labeler chose Text F.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science