1Cademy - Advantage Function as a Form of Shaped Reward

Learn Before

Value-Based Reward Shaping Formula
Temporal Difference (TD) Error as an Advantage Function Estimator

Relation

Advantage Function as a Form of Shaped Reward

The value-based shaped reward, defined as $r' = r + \gamma V(s_{t+1}) - V(s_t)$ , is mathematically equivalent to the Temporal Difference (TD) error, which is a common estimator for the advantage function. This equivalence establishes a direct relationship between advantage-based methods, such as PPO, and reward shaping, demonstrating that the advantage function can be interpreted as a specific instance of a shaped reward.