Learn Before
Concept

Causality Principle in Policy Gradient Calculation

In reinforcement learning, the principle of causality dictates that an action taken at a specific time step tt can only affect rewards from that point forward, not those already received. As a result, rewards accumulated before time tt are considered "fixed" or constant by the time the action at tt is chosen. This implies that the sum of past rewards does not influence the gradient of the policy at time tt, a key insight used in deriving policy gradient algorithms.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences