1Cademy - Benefit of a Baseline in a Positive-Reward Environment

Learn Before

Baseline Method for Policy Gradient Variance Reduction

Short Answer

Benefit of a Baseline in a Positive-Reward Environment

A reinforcement learning agent is being trained in an environment where the total reward for any complete episode is always a large positive number, ranging from +500 to +1000. An engineer decides to modify the learning algorithm by subtracting a baseline value of 750 (the average reward) from the total reward before updating the policy. Explain why this modification is likely to improve the stability of the training process, even though all rewards are already positive.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related