Formula

Reference Policy Definition in RLHF

The reference policy in Reinforcement Learning from Human Feedback (RLHF), denoted as πθref()\pi_{\theta_{\text{ref}}}(\cdot), is defined as a probability distribution, Prθref()\text{Pr}_{\theta_{\text{ref}}}(\cdot), which is determined by the parameters of the reference model, θref\theta_{\text{ref}}. The mathematical expression for this definition is (πθref()=Prθref())(\pi_{\theta_{\text{ref}}}(\cdot) = \text{Pr}_{\theta_{\text{ref}}}(\cdot)).

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences