1Cademy - Distinguishing Model Outputs in Preference Alignment

Learn Before

Architecture and Function of the RLHF Value Model

Short Answer

Distinguishing Model Outputs in Preference Alignment

In a system that uses reinforcement learning to align a language model with human preferences, two key components are a 'reward model' and a 'value model'. Both often share a similar underlying architecture, taking a sequence of text as input and producing a single scalar number as output. Explain the fundamental difference between what the scalar output of the reward model represents versus what the scalar output of the value model represents.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related