1Cademy - Advantage Function in Terms of Q-values and V-values

Learn Before

State-Value and Action-Value Functions

Formula

Advantage Function in Terms of Q-values and V-values

The advantage function, $A(s_t, a_t)$ , defines the benefit of selecting a particular action $a_t$ in a state $s_t$ relative to the expected value of following the policy from that state onward. It is calculated as the difference between the action-value function ( $Q$ -value) for the specific state-action pair and the state-value function ( $V$ -value) for that state. The formula is: $A(s_t, a_t) = Q(s_t, a_t) - V(s_t)$ A positive advantage indicates that the action is better than the expected policy outcome, while a negative advantage suggests it is worse.