1Cademy - A reinforcement learning agent is being trained using a policy gradient method. During training, the agents performance is highly erratic, and the estimated gradients for policy updates have very high variance. Which of the following changes to the gradient estimation process is most directly aimed at stabilizing learning by reducing this variance?

Learn Before

Policy Gradient Reformulation using Advantage Function

Multiple Choice

A reinforcement learning agent is being trained using a policy gradient method. During training, the agent's performance is highly erratic, and the estimated gradients for policy updates have very high variance. Which of the following changes to the gradient estimation process is most directly aimed at stabilizing learning by reducing this variance?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related