1Cademy - Learning from Human Feedback

Learn Before

Potential for Undesirable Content Generation After SFT

Activity (Process)

Learning from Human Feedback

Learning from human feedback is an alignment method used after pre-training and supervised fine-tuning to address the risk of a model generating unfactual, biased, or harmful content. The process involves collecting human evaluations of the model's responses to various inputs, where experts assess the outputs based on their preferences and interests. This collected feedback is then utilized to further train the model, enhancing its alignment with user expectations.

Updated 2026-05-02

Contributors are: