1Cademy - Role of the Critic in Advantage Function Calculation

Learn Before

Actor-Critic Methods

Activity (Process)

Role of the Critic in Advantage Function Calculation

In actor-critic frameworks like Advantage Actor-Critic (A2C), the advantage function is computed by first training a critic network. The critic serves as the evaluator of the policy being learned by the actor, and its purpose is to update its estimation of the state-value function, $V(s_t)$ . Once the critic provides a reliable estimate of $V(s_t)$ , this value is used to calculate the advantage function, typically by computing the temporal difference (TD) error.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Critic Network Loss in A2C
Training the Value Function with a Reward Model
In an actor-critic learning process, an agent is being trained. It is observed that the agent repeatedly takes actions that lead to states with poor long-term outcomes. Assuming the action-selection mechanism is functioning correctly based on its inputs, which of the following describes the most probable malfunction in the state-value estimation component that would cause this behavior?
Debugging an Actor-Critic Agent's Performance
The Critic's Role as a Baseline

Learn Before

Related

Learn After