Learn Before
Diagnosing Flawed Agent Behavior
An agent is being trained to navigate a maze and reach a specific goal location. After extensive training, the agent is observed to be taking an unnecessarily long, winding path to the goal, often revisiting the same locations multiple times before finally reaching the destination. Analyze the agent's reward structure provided below and explain the logical flaw that is most likely causing this inefficient behavior.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Flawed Agent Behavior
A reinforcement learning agent controls a robot vacuum cleaner. The primary goal is for the robot to collect all pieces of trash in a room as quickly as possible. Which of the following reward function designs would be most effective at encouraging the desired behavior without leading to unintended negative consequences?
Designing a Grid World Reward Function