Definition

Function and Inputs of the RLHF Reward Model

Within the Reinforcement Learning from Human Feedback (RLHF) framework, the reward model is structured as a neural network. Its specific purpose is to process a pair of token sequences, consisting of an input x and a corresponding output y, and map them to a single scalar value, which represents the reward.

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences