1Cademy - Baselines Role in Centering Rewards and Reducing Gradient Variance

Learn Before

Baseline Method for Policy Gradient Variance Reduction

Concept

Baseline's Role in Centering Rewards and Reducing Gradient Variance

The subtraction of a baseline, $b$ , from the total reward, $\sum_{t=1}^{T} r_t$ , serves to center the reward values. For instance, if the baseline is defined as the expected total reward, this operation centers the rewards around zero. This centering is the direct mechanism for variance reduction, as it stabilizes the value of the product term, $(\sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t))(\sum_{t=1}^{T} r_t - b)$ , which is used to estimate the policy gradient.