Short Answer

Differential Impact of Baselines on Variance

An AI developer is using a policy gradient algorithm and decides to add a baseline to stabilize training. They observe that the variance of the policy gradient estimates decreases, but the variance of the total rewards per episode remains unchanged. Explain the statistical reasoning behind this observation.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science