1Cademy - The Role of the Reward Model in Scalable Training

Learn Before

Reward Model as an Environment Proxy in RLHF

Short Answer

The Role of the Reward Model in Scalable Training

In the reinforcement learning stage of training a large language model with human feedback, why is a pre-trained 'reward model' used to provide a score for each generated output, instead of asking a human to provide a score in real-time for every single generation?

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences