1Cademy - Advantage of Policy Gradients: Non-Differentiable Reward Functions

Learn Before

Derivation of the Policy Gradient Objective Function

Concept

Advantage of Policy Gradients: Non-Differentiable Reward Functions

A significant advantage of the policy gradient method is that the cumulative reward function, R(τ), is not required to be differentiable. This is because the gradient is calculated with respect to the logarithm of the policy's probability, not the reward itself. This property allows for the application of any type of reward function in reinforcement learning, including those that are discontinuous or arbitrarily complex.

Updated 2026-05-01

Contributors are: