1Cademy - An engineer is training a reinforcement learning agent using a policy-based method. They observe the following training behavior: the agents performance steadily improves for several iterations, but then suddenly collapses, becoming significantly worse than before. This pattern of gradual improvement followed by a catastrophic drop in performance repeats. Which of the following statements provides the most likely explanation for this unstable training dynamic?

Learn Before

Trust Region in Reinforcement Learning Optimization

Multiple Choice

An engineer is training a reinforcement learning agent using a policy-based method. They observe the following training behavior: the agent's performance steadily improves for several iterations, but then suddenly collapses, becoming significantly worse than before. This pattern of gradual improvement followed by a catastrophic drop in performance repeats. Which of the following statements provides the most likely explanation for this unstable training dynamic?

Updated 2025-10-01

Contributors are:

Who are from:

Learn Before

Related