1Cademy - Role and Definition of the Reference Model in RLHF

Learn Before

Policy Learning in RLHF

Concept

Role and Definition of the Reference Model in RLHF

In RLHF, the reference model, with parameters denoted by $\theta_{ref}$ , serves as the baseline Large Language Model that provides the starting point for policy training. This model is typically a prior version of the LLM being trained or a model fine-tuned without human feedback, such as an SFT model. During the policy training phase, the reference model has two key functions: it is used to perform sampling across the range of possible outputs, and it is a component in the loss calculation, helping to regulate the policy updates.