1Cademy - Unbiased Nature of Policy Gradient with Baseline

Learn Before

Policy Gradient Estimate with Baseline

Concept

Unbiased Nature of Policy Gradient with Baseline

A crucial property of using a baseline in policy gradient methods is that it does not introduce any bias into the gradient estimate. While the baseline reduces the variance of the gradient, its expected value remains unchanged. This ensures that, on average, the policy updates still move in the correct direction to improve the policy.

Updated 2025-10-06

Contributors are: