1Cademy - Asymmetric Effect of Upper-Bound Clipping

Learn Before

Upper-Bound Clipping Function for Policy Ratios

Short Answer

Asymmetric Effect of Upper-Bound Clipping

In a policy gradient algorithm, a clipping function is defined as min(ratio, 1 + ε), where ratio is the probability ratio of the new policy to the old policy, and ε is a small positive hyperparameter. Consider two scenarios for a given state-action pair:

The advantage is positive (+3.0) and the ratio is 1.8.
The advantage is negative (-3.0) and the ratio is 1.8.

Assuming ε = 0.2, explain how the clipping function affects the magnitude of the policy update in each scenario and analyze the reasoning behind this differential treatment.

Updated 2025-10-04

Contributors are: