1Cademy - Representing Preference Data

Learn Before

Binary Encoding of Pairwise Feedback in RLHF

Short Answer

Representing Preference Data

In a system designed to learn from human feedback, an expert is repeatedly shown two model-generated text snippets, A and B, and asked to choose which one is better. Describe the most common and simplest way this single choice (e.g., 'A is better than B') is formally represented as a data point for training a model.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related