1Cademy - A research team is developing a reward model to score the quality of AI-generated poetry. Their team of human labelers consists of literary experts from diverse cultural backgrounds, leading to highly subjective and varied opinions on what constitutes good poetry. Given this context, which of the following methods for collecting human feedback would likely introduce the most noise and inconsistency into the reward models training data?

Learn Before

Pointwise Method (Rating) for Human Feedback in RLHF

Multiple Choice

A research team is developing a reward model to score the quality of AI-generated poetry. Their team of human labelers consists of literary experts from diverse cultural backgrounds, leading to highly subjective and varied opinions on what constitutes 'good' poetry. Given this context, which of the following methods for collecting human feedback would likely introduce the most noise and inconsistency into the reward model's training data?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related