1Cademy - Value Network Loss Function in A2C

Learn Before

Critic Network Loss in A2C

Formula

Value Network Loss Function in A2C

In the Advantage Actor-Critic (A2C) algorithm, the loss function for the value network (or critic network), parameterized by $\omega$ , is defined as the mean squared temporal difference (TD) error over a batch of experiences. The formula is given by: $\mathcal{L}_v(\omega) = \frac{1}{M} \sum \left( r_t + \gamma V_\omega(s_{t+1}) - V_\omega(s_t) \right)^2$ Here, $M$ is the number of training samples (for example, for a sequence of $T$ tokens, we can set $M=T$ ). The term $r_t + \gamma V_\omega(s_{t+1})$ represents the computed return (TD target), and $V_\omega(s_t)$ is the predicted state value. Minimizing this loss trains the critic to accurately evaluate the expected return.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

References

Learn Before

Related

Learn After