1Cademy - Advantage Actor-Critic (A2C) Method

Learn Before

Actor-Critic Methods

Concept

Advantage Actor-Critic (A2C) Method

The Advantage Actor-Critic (A2C) method is a reinforcement learning algorithm that optimizes a policy through two interacting components. The actor aims at learning a policy by updating its parameters using a policy gradient objective, which incorporates the advantage function $A(s_t, a_t)$ to focus more on actions likely to improve performance. The critic, on the other hand, acts as an evaluator; it updates its estimation of the state-value function $V(s_t)$ , which is subsequently used to calculate the advantage function.