Case Study

Constructing a Preference Data Sample from Human Feedback

A human labeler is tasked with creating a preference data sample. They are given a prompt and two generated responses. After reviewing them, the labeler provides a rationale for their choice. Based on the information below, construct the correctly formatted data tuple (x, y_preferred, y_rejected) that would be used to train a reward model.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science