Definition

Reward Model as an Environment Proxy in RLHF

In Reinforcement Learning from Human Feedback (RLHF), the reward model acts as a substitute for the environment. For every output sequence generated by the agent, the reward model provides a numerical score, known as the reward. This score serves as a quantitative measure of the output's quality, informing the agent about the desirability of its actions.

0

1

Updated 2026-04-20

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related