1Cademy - An autonomous agent is being trained to navigate a maze and reach a specific exit. The agent receives a small negative feedback signal (-0.1) for every step it takes and a large positive feedback signal (+100) only when it reaches the correct exit. The agents goal is to maximize its total feedback score. Given this feedback structure, what is the most likely reason the agent might fail to learn to solve the maze, even after many attempts?

Learn Before

Reward in Reinforcement Learning

Multiple Choice

An autonomous agent is being trained to navigate a maze and reach a specific exit. The agent receives a small negative feedback signal (-0.1) for every step it takes and a large positive feedback signal (+100) only when it reaches the correct exit. The agent's goal is to maximize its total feedback score. Given this feedback structure, what is the most likely reason the agent might fail to learn to solve the maze, even after many attempts?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related