Differential Impact of Baselines on Variance
An AI developer is using a policy gradient algorithm and decides to add a baseline to stabilize training. They observe that the variance of the policy gradient estimates decreases, but the variance of the total rewards per episode remains unchanged. Explain the statistical reasoning behind this observation.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer training an agent with a policy gradient method notices that the learning process is unstable due to high variance in the gradient estimates. To address this, they introduce a baseline which is subtracted from the rewards. What is the expected statistical consequence of this modification?
Differential Impact of Baselines on Variance
Introducing a baseline into a policy gradient algorithm is an effective technique for reducing the variance of the total rewards collected by the agent, which in turn stabilizes the learning process.