1Cademy - Differential Impact of Baselines on Variance

Learn Before

Baseline's Impact on Reward Variance vs. Gradient Estimate Variance

Short Answer

Differential Impact of Baselines on Variance

An AI developer is using a policy gradient algorithm and decides to add a baseline to stabilize training. They observe that the variance of the policy gradient estimates decreases, but the variance of the total rewards per episode remains unchanged. Explain the statistical reasoning behind this observation.

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related