1Cademy - General Loss Minimization Objective for Reward Model Training

Learn Before

Reward Model Learning in RLHF

Concept

General Loss Minimization Objective for Reward Model Training

In Reinforcement Learning from Human Feedback, the training of the reward model is framed as a loss minimization problem. The general objective is to minimize a loss function, denoted as $L(\cdot)$ , which is dependent on the input prompt ( $x$ ), a set of model-generated outputs (e.g., {y1, y2}), and the reward model ( $r(\cdot)$ ) itself. By minimizing this loss function, the reward model learns to assign scores to outputs in a manner that reflects the collected human preference data.