1Cademy - Reward Model as an Environment Proxy in RLHF

Learn Before

Reinforcement Learning from Human Feedback (RLHF)

Definition

Reward Model as an Environment Proxy in RLHF

In Reinforcement Learning from Human Feedback (RLHF), the reward model acts as a substitute for the environment. For every output sequence generated by the agent, the reward model provides a numerical score, known as the reward. This score serves as a quantitative measure of the output's quality, informing the agent about the desirability of its actions.

Updated 2026-04-20

Contributors are: