1Cademy - Advantage Function Estimation using Reward-to-Go

Learn Before

Advantage Function Definition

Formula

Advantage Function Estimation using Reward-to-Go

The advantage at a time step t, denoted as $A(s_t, a_t)$ , quantifies the relative benefit of taking a specific action compared to the expected value of following the policy from state $s_t$ onward. It can be estimated by subtracting a baseline from the actual return. Using the state-value function $V(s_t)$ as the baseline, the formula is: $A(s_t, a_t) = \sum_{k=t}^{T} r_k - V(s_t)$ In this equation, the term $\sum_{k=t}^{T} r_k$ represents the actual return received from time step $t$ , while $V(s_t)$ represents the expected return from state $s_t$ .