Potential-Based Shaping Function Formula
To ensure that reward shaping does not alter the optimal policy, the shaping reward function must be defined as the difference between potential values of successive states. This is known as a potential-based shaping function, given by the formula: Here, is a real-valued potential function defined over the state space, and is the discount factor. This specific form of guarantees that the optimality of the policy is preserved.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Potential-Based Shaping Function Formula
Analysis of a Flawed Reward Shaping Implementation
A reinforcement learning agent is being trained to navigate a maze. The original reward function provides a large positive reward only upon reaching the exit. To speed up learning, a developer adds a shaping reward function that gives a small, constant positive reward for every single action the agent takes, regardless of the state. After this change, the agent learns a new policy of moving in a perpetual loop instead of solving the maze. Why did adding this specific shaping reward alter the optimal policy?
Critique of an Arbitrary Shaping Function
Learn After
Value-Based Reward Shaping Formula
A reinforcement learning engineer wants to add an extra reward signal, denoted as a function
f, to an agent's learning process to encourage more efficient exploration. They have access to a functionΦ(s)which provides a numerical estimate of a state's value, and a discount factorγ. To guarantee that this additional reward signal does not alter the agent's optimal long-term behavior, which of the following structures must the functionfhave for a transition from states_ttos_{t+1}?Analyzing a Flawed Reward Shaping Implementation
Validating a Potential-Based Shaping Function