Activity (Process)

Four-Stage Process of Reinforcement Learning from Human Feedback (RLHF)

The Reinforcement Learning from Human Feedback (RLHF) framework can be conceptualized as a four-stage pipeline. The process begins with (a) training an initial language model, or policy, typically through pre-training followed by instruction fine-tuning (also referred to as supervised fine-tuning). In the second stage (b), this model generates multiple outputs for various inputs, and human preference data is collected by comparing and ranking these outputs. This collected ranking data is then used in the third stage (c) to train a reward model that learns to score responses based on human judgments. In the final stage (d), the initial language model policy is further fine-tuned using reinforcement learning, where the trained reward model provides the supervision signal to align outputs with human preferences.

Image 0

0

1

Updated 2026-04-20

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related