1Cademy - State-Value Function as a Baseline

Learn Before

Baseline Method for Policy Gradient Variance Reduction

Concept

State-Value Function as a Baseline

A common and effective strategy for setting the baseline, $b$ , in policy gradient methods is to use the state-value function, $V(s_t)$ . This function represents the expected cumulative future reward from a given state $s_t$ , formally defined as $V(s_t) = E[r_t + r_{t+1} + \dots + r_T]$ . Using the value of the current state as a baseline helps to center the rewards and can significantly reduce the variance of the gradient estimate.