1Cademy - Purpose of Reward Decomposition in Policy Gradient

Learn Before

Derivation of Reward Decomposition in Policy Gradient with Baseline

Short Answer

Purpose of Reward Decomposition in Policy Gradient

A key step in reformulating the policy gradient expression involves rewriting the total reward for a trajectory, (sum_{k=1 to T} r_k), as a sum of two components: rewards accumulated before a given timestep t (sum_{k=1 to t-1} r_k) and rewards accumulated from that timestep onward (sum_{k=t to T} r_k). Explain the primary motivation for performing this decomposition. What does this separation of rewards allow for in subsequent steps of the gradient calculation?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related