1Cademy - Definition of the Advantage Function

Learn Before

Actor-Critic Methods

Definition

Definition of the Advantage Function

The advantage function, $A(s_t, a_t)$ , measures the relative benefit of taking a specific action $a_t$ in a state $s_t$ compared to the expected value of following the policy from that state onward. It is formally defined as the difference between the action-value function ( $Q$ -value) and the state-value function ( $V$ -value): $A(s_t, a_t) = Q(s_t, a_t) - V(s_t)$ This formulation is central to methods like the Advantage Actor-Critic (A2C) algorithm, where it helps focus the policy gradient updates on actions that are likely to improve performance.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Policy Gradient with Advantage Function Formula
A2C Loss Function Formulation
In a reinforcement learning scenario, an agent is in a particular state. The estimated value of being in this state, averaged over all possible actions the agent could take, is +10. If the agent chooses a specific action, the estimated value of taking that particular action in that state is +8. Based on this information, what can be concluded about this specific action?
If an action has a positive advantage value, it means that taking this action is guaranteed to result in a higher immediate reward than any other action available in that state.
Interpreting Action Advantage

Learn Before

Related

Learn After