1Cademy - Consider the formula for the policy gradient estimate with a baseline: $$ \frac{\partial J(\theta)}{\partial \theta} = \frac{1}{|D|} \sum_{\tau \in D} \left( \frac{\partial}{\partial \theta} \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) \right) \left( \sum_{t=1}^{T} r_t - b \right) $$ According to this formula, the baseline value `b` is subtracted from the reward `r_t` at each individual timestep `t` within a trajectory to reduce variance.

Learn Before

Policy Gradient Estimate with Baseline

True/False

Consider the formula for the policy gradient estimate with a baseline: $\frac{\partial J(\theta)}{\partial \theta} = \frac{1}{|D|} \sum_{\tau \in D} \left( \frac{\partial}{\partial \theta} \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) \right) \left( \sum_{t=1}^{T} r_t - b \right)$ According to this formula, the baseline value b is subtracted from the reward r_t at each individual timestep t within a trajectory to reduce variance.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related