1Cademy - Policy Probability Ratio Less Than One

Learn Before

Less Than Inequality
Policy Probability Ratio (Ratio Function)

Formula

Policy Probability Ratio Less Than One

The condition $\frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\mathrm{ref}}}(a_t|s_t)} < 1$ signifies that a specific action $a_t$ is less favored by the current policy $\pi_{\theta}$ than by the reference policy $\pi_{\theta_{\mathrm{ref}}}$ . This indicates that the current policy is less likely to choose that particular action compared to the reference policy.