Learn Before
Critic Network Training Target
In an actor-critic reinforcement learning framework, the critic network learns to estimate the value of being in a particular state. To train this network, its output for a state is compared against a 'target' value using a mean squared error loss. Describe the two components that are combined to create this target value for a given state transition.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Value Network Loss Function in A2C
In a reinforcement learning agent using an actor-critic architecture, the critic network is being trained. For a given state transition, the network makes the following predictions:
- Predicted value for the current state: 15.0
- Predicted value for the next state: 20.0
The agent receives a reward of 5.0 for the transition, and the discount factor is 0.9.
Based on this single experience, how should the critic network's parameters be adjusted to minimize its loss?
Critic Network Training Target
Critic Network Performance Analysis