1Cademy - Policy Probability Ratio Greater Than One

Learn Before

Policy Probability Ratio (Ratio Function)

Formula

Policy Probability Ratio Greater Than One

The inequality $\frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\mathrm{ref}}}(a_t|s_t)} > 1$ expresses the condition where the probability of selecting action $a_t$ in state $s_t$ under the current policy $\pi_{\theta}$ is greater than the probability under a reference policy $\pi_{\theta_{\mathrm{ref}}}$ . This signifies that the current policy is more likely to choose the action $a_t$ than the reference policy. This comparison is a fundamental component in certain reinforcement learning algorithms, particularly in policy optimization methods, where the goal is to adjust the policy $\pi_{\theta}$ to be more favorable than a baseline or previous iteration of the policy.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

References

Learn Before

Related

Learn After