Multiple Choice

An engineer is training a reinforcement learning agent using a policy-based method. They observe the following training behavior: the agent's performance steadily improves for several iterations, but then suddenly collapses, becoming significantly worse than before. This pattern of gradual improvement followed by a catastrophic drop in performance repeats. Which of the following statements provides the most likely explanation for this unstable training dynamic?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science