1Cademy - Analysis of the Baselines Effect on Policy Gradient Expectation

Learn Before

Unbiased Nature of Policy Gradient with Baseline

Short Answer

Analysis of the Baseline's Effect on Policy Gradient Expectation

In policy gradient methods, a baseline term, which often depends only on the state, is subtracted from the reward to reduce the variance of the gradient estimate. Explain mathematically why the expectation of the gradient contribution from such a baseline is zero, and therefore does not introduce bias to the overall policy gradient estimate. Your explanation should focus on the properties of the score function (the gradient of the log-policy).

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related