1Cademy - Comparison of Annotation Methods for Human Feedback in RLHF

Learn Before

Generation of Candidate Outputs from Input-Only Datasets in RLHF

Comparison

Comparison of Annotation Methods for Human Feedback in RLHF

When collecting human feedback in RLHF, there are two primary methods for evaluating model-generated outputs. One approach is to have annotators assign a direct numerical rating to each output, which frames the reward model's training as a regression problem. However, this method is challenging because establishing a consistent and universally accepted scoring standard is difficult. A more popular and simpler alternative is to have annotators rank the outputs by preference, which is a more reliable task for humans.