1Cademy - Training of Reward Models

Learn Before

Actor-Critic Methods

Activity (Process)

Training of Reward Models

A critical component within certain reinforcement learning frameworks is the reward model, which must be trained to accurately reflect desired outcomes (e.g., human preferences). The process of training this model is a distinct step that precedes its use in training the value function and policy.

Updated 2026-04-20

Contributors are: