1Cademy - Analysis of a Flawed Reward Shaping Implementation

Learn Before

Condition for Policy Invariance in Reward Shaping

Case Study

Analysis of a Flawed Reward Shaping Implementation

Based on the scenario described below, analyze why the agent's learned behavior changed from optimal to suboptimal after the introduction of a shaping reward.

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Potential-Based Shaping Function Formula
Analysis of a Flawed Reward Shaping Implementation
A reinforcement learning agent is being trained to navigate a maze. The original reward function provides a large positive reward only upon reaching the exit. To speed up learning, a developer adds a shaping reward function that gives a small, constant positive reward for every single action the agent takes, regardless of the state. After this change, the agent learns a new policy of moving in a perpetual loop instead of solving the maze. Why did adding this specific shaping reward alter the optimal policy?
Critique of an Arbitrary Shaping Function

Learn Before

Related