1Cademy - In the context of improving a policy gradient estimator, the total reward for a trajectory, \( \sum_{k=1}^{T} r_k \), is often rewritten inside the gradient calculation for a specific timestep `t` as \( \sum_{k=1}^{t-1} r_k + \sum_{k=t}^{T} r_k \). This specific algebraic decomposition, by itself, alters the expected value of the gradient estimate.

Learn Before

Decomposition of Reward Sum for Causality in Policy Gradients

True/False

In the context of improving a policy gradient estimator, the total reward for a trajectory, ( \sum_{k=1}^{T} r_k ), is often rewritten inside the gradient calculation for a specific timestep t as ( \sum_{k=1}^{t-1} r_k + \sum_{k=t}^{T} r_k ). This specific algebraic decomposition, by itself, alters the expected value of the gradient estimate.

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related