1Cademy - Target Model (Policy Model) in RLHF

Learn Before

Architectural Components of an RLHF System

Definition

Target Model (Policy Model) in RLHF

In the RLHF framework, the Target Model, also known as the policy model, is the Large Language Model being actively trained. Its policy, denoted as $\pi_{\theta}(\cdot)$ and formally defined as the probability distribution $\text{Pr}_{\theta}(\cdot)$ , governs the generation of the next token based on the current context. The model's parameters, $\theta$ , are updated during training under the guidance of both the reward and value models.