1Cademy - Function and Inputs of the RLHF Reward Model

Learn Before

Architecture and Function of the RLHF Reward Model

Definition

Function and Inputs of the RLHF Reward Model

Within the Reinforcement Learning from Human Feedback (RLHF) framework, the reward model is structured as a neural network. Its specific purpose is to process a pair of token sequences, consisting of an input x and a corresponding output y, and map them to a single scalar value, which represents the reward.