1Cademy - In the derivation of a policy gradient algorithm, we aim to update a policy based on actions taken within an episode. A core principle states that an action taken at a specific time step, $t$, can only influence rewards received from that point forward ($r_t, r_{t+1}, ...$). Given this principle, which of the following mathematical expressions correctly identifies the reward term that should be used to scale the gradient update for the action at time step $t$?

Learn Before

Causality Principle in Policy Gradient Calculation

Multiple Choice

In the derivation of a policy gradient algorithm, we aim to update a policy based on actions taken within an episode. A core principle states that an action taken at a specific time step, $t$ , can only influence rewards received from that point forward ( $r_t, r_{t+1}, ...$ ). Given this principle, which of the following mathematical expressions correctly identifies the reward term that should be used to scale the gradient update for the action at time step $t$ ?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related