1Cademy - Constructing a Preference Data Sample from Human Feedback

Learn Before

Preference Data Sample for Reward Model Training

Case Study

Constructing a Preference Data Sample from Human Feedback

A human labeler is tasked with creating a preference data sample. They are given a prompt and two generated responses. After reviewing them, the labeler provides a rationale for their choice. Based on the information below, construct the correctly formatted data tuple (x, y_preferred, y_rejected) that would be used to train a reward model.

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related