1Cademy - Approximation of the Policy Divergence Penalty

Learn Before

Log-Probability Difference as a Policy Divergence Penalty

Concept

Approximation of the Policy Divergence Penalty

In practical applications, the policy divergence penalty, which is based on the log-probability of a trajectory, can be simplified. This approximation involves calculating the penalty using only the policy probabilities while ignoring the influence of the environment's dynamics, leading to a more computationally tractable measure.