The Critic's Role as a Baseline
In an actor-critic reinforcement learning setup, the 'actor' has just performed an action in a given state and received an immediate reward. To calculate the advantage of this action, another value is needed besides the immediate reward. What specific value must the 'critic' network provide, and why is this value essential for determining if the action was better or worse than expected?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Critic Network Loss in A2C
Training the Value Function with a Reward Model
In an actor-critic learning process, an agent is being trained. It is observed that the agent repeatedly takes actions that lead to states with poor long-term outcomes. Assuming the action-selection mechanism is functioning correctly based on its inputs, which of the following describes the most probable malfunction in the state-value estimation component that would cause this behavior?
Debugging an Actor-Critic Agent's Performance
The Critic's Role as a Baseline