Definition

Target Model (Policy Model) in RLHF

In the RLHF framework, the Target Model, also known as the policy model, is the Large Language Model being actively trained. Its policy, denoted as πθ()\pi_{\theta}(\cdot) and formally defined as the probability distribution Prθ()\text{Pr}_{\theta}(\cdot), governs the generation of the next token based on the current context. The model's parameters, θ\theta, are updated during training under the guidance of both the reward and value models.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences