1Cademy - Causality Principle in Policy Gradient Calculation

Learn Before

Causality Constraint in Reinforcement Learning

Concept

Causality Principle in Policy Gradient Calculation

In reinforcement learning, the principle of causality dictates that an action taken at a specific time step $t$ can only affect rewards from that point forward, not those already received. As a result, rewards accumulated before time $t$ are considered "fixed" or constant by the time the action at $t$ is chosen. This implies that the sum of past rewards does not influence the gradient of the policy at time $t$ , a key insight used in deriving policy gradient algorithms.