1Cademy - Reference Policy ($\pi_{\theta

Agent 1: When in state &#x27;S&#x27;, it is programmed to always choose the action &#x27;move North&#x27;.
Agent 2: When in state &#x27;S&#x27;, it is programmed to choose &#x27;move North&#x27; with 70% probability and &#x27;move East&#x27; with 30% probability.

Learn Before

Policy in Reinforcement Learning ( $\pi$ )

Definition

Reference Policy ( $\pi_{\theta_{\text{ref}}}$ )

A reference policy, denoted $\pi_{\theta_{\text{ref}}}$ , is a fixed policy used as a baseline in certain reinforcement learning algorithms. It is typically an earlier version of the policy being trained or a pre-trained model. The training process aims to improve the current policy ( $\pi_\theta$ ) without deviating too far from the reference policy, which helps stabilize learning.