Learn Before
Condition for Policy Invariance in Reward Shaping
When using a transformed reward function, the choice of the shaping reward function is critical for maintaining the original optimal policy. To ensure that the agent's optimal behavior is not altered, the shaping function must be defined in a specific, constrained form rather than being an arbitrary addition.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Condition for Policy Invariance in Reward Shaping
A reinforcement learning agent is operating in an environment where taking a specific action in a given state results in a transition to a new state. The environment's original reward for this transition is -0.5. To guide the agent more effectively, a shaping function is added, which provides an additional reward value of +2.0 for this same transition. According to the standard formulation for reward shaping, what is the total transformed reward the agent receives?
Deconstructing a Shaped Reward Function
Analyzing Reward Components in a Maze Navigation Task
Learn After
Potential-Based Shaping Function Formula
Analysis of a Flawed Reward Shaping Implementation
A reinforcement learning agent is being trained to navigate a maze. The original reward function provides a large positive reward only upon reaching the exit. To speed up learning, a developer adds a shaping reward function that gives a small, constant positive reward for every single action the agent takes, regardless of the state. After this change, the agent learns a new policy of moving in a perpetual loop instead of solving the maze. Why did adding this specific shaping reward alter the optimal policy?
Critique of an Arbitrary Shaping Function