1Cademy - Analyzing the Impact of a State-Value Baseline

Learn Before

State-Value Function as a Baseline

Short Answer

Analyzing the Impact of a State-Value Baseline

In a policy gradient algorithm, the state-value function is often used as a baseline to reduce the variance of gradient estimates. Explain how subtracting the expected future reward from the current state (the state-value) helps to differentiate between a 'good' action taken in a 'bad' state versus a 'mediocre' action taken in a 'good' state. How does this differentiation lead to a more stable learning signal?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related