Learn Before
An AI team is preparing a dataset for training a reward model. They present pairs of model-generated text, (Response 1, Response 2), to human labelers. The team's convention is to encode the preference as '1' if Response 1 is chosen, and '0' if Response 2 is chosen. Given the following three labeling results, what is the correct sequence of binary labels that should be recorded for the dataset?
- Session 1: The pair (Text A, Text B) was shown, and the labeler chose Text B.
- Session 2: The pair (Text C, Text D) was shown, and the labeler chose Text C.
- Session 3: The pair (Text E, Text F) was shown, and the labeler chose Text F.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A human labeler is tasked with providing feedback on two different AI-generated summaries of an article, labeled Summary A and Summary B. After reviewing both, the labeler selects Summary B as the better one. In a typical system that uses pairwise comparisons to gather human feedback, how is this single preference decision mathematically encoded for the training process?
In a system that collects human feedback by presenting two model-generated responses for comparison, if a human evaluator strongly prefers response A over response B, this preference is encoded with a higher numerical value than if they only slightly preferred response A.
Representing Preference Data
An AI team is preparing a dataset for training a reward model. They present pairs of model-generated text, (Response 1, Response 2), to human labelers. The team's convention is to encode the preference as '1' if Response 1 is chosen, and '0' if Response 2 is chosen. Given the following three labeling results, what is the correct sequence of binary labels that should be recorded for the dataset?
- Session 1: The pair (Text A, Text B) was shown, and the labeler chose Text B.
- Session 2: The pair (Text C, Text D) was shown, and the labeler chose Text C.
- Session 3: The pair (Text E, Text F) was shown, and the labeler chose Text F.