Formula

A2C Actor Loss Function

The actor's loss function in the Advantage Actor-Critic (A2C) framework is designed to optimize the policy by maximizing expected utility. This loss is defined as the negative expected value of the utility function U(τ;θ)U(\tau; \theta) across sampled trajectories τ\tau from a dataset D\mathcal{D}. Mathematically, it is expressed as: L(θ)=EτD[U(τ;θ)]=EτD[t=1Tlogπθ(atst)A(st,at)]\mathcal{L}(\theta) = -\mathbb{E}_{\tau \sim \mathcal{D}} \big[U(\tau; \theta) \big] = -\mathbb{E}_{\tau \sim \mathcal{D}} \Big[ \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) A(s_t,a_t) \Big]. By minimizing this loss function, the model adjusts its policy πθ\pi_{\theta} to favor actions that result in a higher advantage, thereby improving overall performance.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related