1Cademy - Reference Policy Definition in RLHF

Learn Before

Architectural Components of an RLHF System

Formula

Reference Policy Definition in RLHF

The reference policy in Reinforcement Learning from Human Feedback (RLHF), denoted as $\pi_{\theta_{\text{ref}}}(\cdot)$ , is defined as a probability distribution, $\text{Pr}_{\theta_{\text{ref}}}(\cdot)$ , which is determined by the parameters of the reference model, $\theta_{\text{ref}}$ . The mathematical expression for this definition is $\pi_{\theta_{\text{ref}}}(\cdot) = \text{Pr}_{\theta_{\text{ref}}}(\cdot)$ .

Updated 2026-06-23

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

In a reinforcement learning process that uses human feedback, a 'reference model' with a fixed set of parameters, $\theta_{\text{ref}}$ , is used as a baseline. For a specific input prompt, this model calculates that the probability of generating the word 'consequently' as the next word is 0.04. Given that the reference policy, $\pi_{\theta_{\text{ref}}}(\cdot)$ , is formally defined as the probability distribution generated by this reference model, what is the value of $\pi_{\theta_{\text{ref}}}(
True or False: In a Reinforcement Learning from Human Feedback (RLHF) system, the reference policy $\pi_{\theta_{\text{ref}}}(\cdot)$ is a function that is trained to approximate the probability distribution $\text{Pr}_{ heta_{\text{ref}}}(\cdot)$ generated by the reference model.
Reference Policy and Model Probability

Learn Before

Related

Learn After