1Cademy - An AI team is preparing a dataset for training a reward model. They present pairs of model-generated text, (Response 1, Response 2), to human labelers. The teams convention is to encode the preference as 1 if Response 1 is chosen, and 0 if Response 2 is chosen. Given the following three labeling results, what is the correct sequence of binary labels that should be recorded for the dataset?<br><br>- Session 1: The pair (Text A, Text B) was shown, and the labeler chose Text B.<br>- Session 2: The pair

Learn Before

Binary Encoding of Pairwise Feedback in RLHF

Multiple Choice

An AI team is preparing a dataset for training a reward model. They present pairs of model-generated text, (Response 1, Response 2), to human labelers. The team's convention is to encode the preference as '1' if Response 1 is chosen, and '0' if Response 2 is chosen. Given the following three labeling results, what is the correct sequence of binary labels that should be recorded for the dataset?

Session 1: The pair (Text A, Text B) was shown, and the labeler chose Text B.
Session 2: The pair

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related