1Cademy - Increased Action Probability Condition

Learn Before

Greater Than Inequality
Policy Probability Ratio (Ratio Function)

Formula

Increased Action Probability Condition

The inequality $\frac{\pi_{\theta}(a_t|s_t)}{\pi_{\theta_{\text{ref}}}(a_t|s_t)} > 1$ indicates that a given action $a_t$ is more favored by the current policy $\pi_{\theta}$ compared to the reference policy $\pi_{\theta_{\text{ref}}}$ . In reinforcement learning, this condition is often desirable for actions that have proven to be advantageous, as it signifies a positive update to the policy's behavior.