1Cademy - In a reinforcement learning agent using an actor-critic architecture, the critic network is being trained. For a given state transition, the network makes the following predictions: - Predicted value for the current state: 15.0 - Predicted value for the next state: 20.0 The agent receives a reward of 5.0 for the transition, and the discount factor is 0.9. Based on this single experience, how should the critic networks parameters be adjusted to minimize its loss?

Learn Before

Critic Network Loss in A2C

Multiple Choice

In a reinforcement learning agent using an actor-critic architecture, the critic network is being trained. For a given state transition, the network makes the following predictions:

Predicted value for the current state: 15.0
Predicted value for the next state: 20.0

The agent receives a reward of 5.0 for the transition, and the discount factor is 0.9.

Based on this single experience, how should the critic network's parameters be adjusted to minimize its loss?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related