Debugging an Actor-Critic Agent's Performance
Based on the provided scenario, explain how the critic network's flawed estimation is directly causing the observed training instability. Your explanation should detail the mechanism through which the critic's output influences the actor's learning process.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Critic Network Loss in A2C
Training the Value Function with a Reward Model
In an actor-critic learning process, an agent is being trained. It is observed that the agent repeatedly takes actions that lead to states with poor long-term outcomes. Assuming the action-selection mechanism is functioning correctly based on its inputs, which of the following describes the most probable malfunction in the state-value estimation component that would cause this behavior?
Debugging an Actor-Critic Agent's Performance
The Critic's Role as a Baseline