1Cademy - Baselines Impact on Reward Variance vs. Gradient Estimate Variance

Learn Before

Baseline Method for Policy Gradient Variance Reduction

Comparison

Baseline's Impact on Reward Variance vs. Gradient Estimate Variance

While introducing a baseline $b$ does not change the overall variance of the total rewards $\sum_{t=1}^{T} r_t$ , it is crucial for reducing the variance of the gradient estimates. Subtracting the baseline from the total rewards reduces fluctuations around their mean, which makes the gradient estimates more stable. In general, this centers the rewards around zero, leading to reduced variance in the product $\sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) \left(\sum_{t=1}^{T} r_t - b\right)$ .