Learn Before
Reward Shaping Formula
In reward shaping, a new reward signal, known as the transformed reward function , is created by adding a shaping reward function to the environment's original reward function . This relationship is expressed by the formula: This technique provides an agent with additional feedback, where all functions depend on the current state , action , and next state .

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Reward Shaping Formula
An agent is being trained to navigate a complex maze. It receives a large positive reward (+100) only upon reaching the exit, and a reward of 0 for all other steps. To accelerate learning in this environment with delayed feedback, a developer decides to add an additional, intermediate reward at each step. Which of the following intermediate reward strategies is most likely to guide the agent effectively toward the exit without inadvertently changing the optimal path?
Analyzing Reward Shaping Strategies for Text Summarization
A key advantage of implementing a potential-based reward shaping function is that it fundamentally alters the optimal set of actions an agent should take, thereby simplifying complex problems with sparse rewards.
Learn After
Condition for Policy Invariance in Reward Shaping
A reinforcement learning agent is operating in an environment where taking a specific action in a given state results in a transition to a new state. The environment's original reward for this transition is -0.5. To guide the agent more effectively, a shaping function is added, which provides an additional reward value of +2.0 for this same transition. According to the standard formulation for reward shaping, what is the total transformed reward the agent receives?
Deconstructing a Shaped Reward Function
Analyzing Reward Components in a Maze Navigation Task