1Cademy - A reinforcement learning agent is being trained to navigate a maze. The original reward function provides a large positive reward only upon reaching the exit. To speed up learning, a developer adds a shaping reward function that gives a small, constant positive reward for every single action the agent takes, regardless of the state. After this change, the agent learns a new policy of moving in a perpetual loop instead of solving the maze. Why did adding this specific shaping reward alter the optimal policy?

Learn Before

Condition for Policy Invariance in Reward Shaping

Multiple Choice

A reinforcement learning agent is being trained to navigate a maze. The original reward function provides a large positive reward only upon reaching the exit. To speed up learning, a developer adds a shaping reward function that gives a small, constant positive reward for every single action the agent takes, regardless of the state. After this change, the agent learns a new policy of moving in a perpetual loop instead of solving the maze. Why did adding this specific shaping reward alter the optimal policy?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related