During the reinforcement learning phase of model alignment, the reward model's primary function is to output a binary classification for each generated response, labeling it as either 'preferred' or 'not preferred'.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Continuous Supervision from the RLHF Reward Model
A language model is being aligned using feedback from human preferences. A separate model is first trained to distinguish between pairs of model-generated responses, learning to identify the better one in each pair. This model is then used to assign a single numerical value to each new response generated by the language model, guiding its optimization. What is the most significant advantage of this two-stage process?
During the reinforcement learning phase of model alignment, the reward model's primary function is to output a binary classification for each generated response, labeling it as either 'preferred' or 'not preferred'.
The Reward Model's Functional Shift
Policy Gradient Objective Function for RL Fine-Tuning