1Cademy - Policy Gradient Reformulation using Advantage Function

Learn Before

Advantage Function Definition

Concept

Policy Gradient Reformulation using Advantage Function

The policy gradient, which represents the gradient of the objective function $J( heta)$ , can be reformulated to use the advantage function $A(s_t, a_t)$ . This substitution is a common technique in policy gradient algorithms because it helps to reduce the high variance often associated with gradient estimates, leading to more stable and efficient learning.