1Cademy - Advantage Function Formula

Policy A Reward Sequence: [+10, +1, +1, +1, ...]
Policy B Reward Sequence: [+3, +3, +3, +3, ...]

Learn Before

State-Value Function (V) Formula

Formula

Advantage Function Formula

The Advantage Function, $A(s_t, a_t)$ , measures the relative benefit of taking a specific action $a_t$ in a state $s_t$ compared to the expected value from that state onward. It is calculated by subtracting the state-value function, $V(s_t)$ , which acts as a baseline ( $b$ ), from the sum of future rewards. The formula is: A(s_t, a_t) = sum_{k=t}^{T} r_k - V(s_t)