1Cademy - A2C Actor Loss Function

Learn Before

A2C Loss Function Formulation
General Objective for Parameter Optimization via Loss Minimization
Policy Gradient with Advantage Function Formula

Formula

A2C Actor Loss Function

The actor's loss function in the Advantage Actor-Critic (A2C) framework is designed to optimize the policy by maximizing expected utility. This loss is defined as the negative expected value of the utility function $U(\tau; \theta)$ across sampled trajectories $\tau$ from a dataset $\mathcal{D}$ . Mathematically, it is expressed as: $\mathcal{L}(\theta) = -\mathbb{E}_{\tau \sim \mathcal{D}} \big[U(\tau; \theta) \big] = -\mathbb{E}_{\tau \sim \mathcal{D}} \Big[ \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) A(s_t,a_t) \Big]$ . By minimizing this loss function, the model adjusts its policy $\pi_{\theta}$ to favor actions that result in a higher advantage, thereby improving overall performance.