Learn Before
Rationale for Freezing the Reference Model in RLHF
During the initialization stage of the Reinforcement Learning from Human Feedback (RLHF) process, the reference model's parameters are fixed and are not updated during subsequent training. Analyze the primary reason for this decision and predict the potential negative outcome if this model's parameters were allowed to be updated alongside the policy model.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Data Collection for Reward Modeling in RLHF
A machine learning team is implementing a training process that uses human feedback to align a language model. They have access to two base models: a general-purpose pre-trained language model (Model A) and a version of that model that has been further fine-tuned on a set of instructions (Model B). For the first stage of their process, which of the following initialization plans is correct for the policy, reference, reward, and value models?
Rationale for Freezing the Reference Model in RLHF
Analyzing an RLHF Initialization Error