1Cademy - An engineer training an agent with a policy gradient method notices that the learning process is unstable due to high variance in the gradient estimates. To address this, they introduce a baseline which is subtracted from the rewards. What is the expected statistical consequence of this modification?

Learn Before

Baseline's Impact on Reward Variance vs. Gradient Estimate Variance

Multiple Choice

An engineer training an agent with a policy gradient method notices that the learning process is unstable due to high variance in the gradient estimates. To address this, they introduce a baseline which is subtracted from the rewards. What is the expected statistical consequence of this modification?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related