1Cademy - During the reinforcement learning phase of model alignment, the reward models primary function is to output a binary classification for each generated response, labeling it as either preferred or not preferred.

Learn Before

Dual Role of the RLHF Reward Model: Ranking-based Training for Scoring Application

True/False

During the reinforcement learning phase of model alignment, the reward model's primary function is to output a binary classification for each generated response, labeling it as either 'preferred' or 'not preferred'.

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Continuous Supervision from the RLHF Reward Model
A language model is being aligned using feedback from human preferences. A separate model is first trained to distinguish between pairs of model-generated responses, learning to identify the better one in each pair. This model is then used to assign a single numerical value to each new response generated by the language model, guiding its optimization. What is the most significant advantage of this two-stage process?
During the reinforcement learning phase of model alignment, the reward model's primary function is to output a binary classification for each generated response, labeling it as either 'preferred' or 'not preferred'.
The Reward Model's Functional Shift
Policy Gradient Objective Function for RL Fine-Tuning

Learn Before

Related