1Cademy - A2C Loss Function Formulation

Learn Before

Policy Gradient Objective with Advantage Function
Definition of the Advantage Function

Concept

A2C Loss Function Formulation

In the Advantage Actor-Critic (A2C) algorithm, the loss function is constructed based on the policy gradient objective that uses the advantage function. This objective, often expressed as a utility function $U(\tau; \theta) = \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t)A(s_t, a_t)$ , forms the core of the actor's loss, which is minimized during training to improve the policy. By maximizing the utility over sampled trajectories $\tau$ , the model adjusts its policy to select actions with higher advantages.