1Cademy - Applying a Bounding Constraint on Probability Ratios

Learn Before

Bound Function for Policy Probability Ratio

Short Answer

Applying a Bounding Constraint on Probability Ratios

In a reinforcement learning algorithm, a ratio comparing the probability of an action under a new policy to an old policy is constrained to stay within a specific interval to ensure training stability. This interval is defined as [1 - ε, 1 + ε]. If the constraint parameter ε is set to 0.25, what would be the final constrained values for the following two independently calculated ratios?

Initial Ratio: 1.40
Initial Ratio: 0.65

Provide the final value for each case.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related