1Cademy - Justification for Using RLHF over Supervised Learning

Learn Before

Reinforcement Learning from Human Feedback (RLHF)

Comparison

Justification for Using RLHF over Supervised Learning

Reinforcement Learning from Human Feedback (RLHF) is often preferred over standard supervised learning for model alignment due to fundamental difficulties in data annotation. For supervised methods, it is challenging for humans to articulate complex values and goals, and even more difficult to demonstrate them by authoring perfectly aligned outputs. RLHF addresses this by shifting the human task from difficult demonstration to the simpler act of expressing preferences over a list of model-generated options. This preference data is then used to train a reward model that captures human values. Furthermore, RLHF offers an exploration advantage, as it can use sampling to generate and evaluate outputs beyond the original annotated dataset, potentially discovering superior policies.

Updated 2026-04-20

Contributors are:

Who are from:

References

Learn Before

Related

Learn After