Learn Before
Activity (Process)

Role of the Critic in Advantage Function Calculation

In actor-critic frameworks like Advantage Actor-Critic (A2C), the advantage function is computed by first training a critic network. The critic serves as the evaluator of the policy being learned by the actor, and its purpose is to update its estimation of the state-value function, V(st)V(s_t). Once the critic provides a reliable estimate of V(st)V(s_t), this value is used to calculate the advantage function, typically by computing the temporal difference (TD) error.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences