Concept

Critic Network Loss in A2C

In the Advantage Actor-Critic (A2C) algorithm, the critic network (or value network) is trained using a specific loss function. This loss is generally formulated as the mean squared error between the computed return, rt+γV(st+1)r_t + \gamma V(s_{t+1}), and the predicted state value, V(st)V(s_t). The training process adjusts the critic network's parameters, denoted by ω\omega, to minimize this error, thereby improving its evaluation of the policy.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences