Concept

Condition for Policy Invariance in Reward Shaping

When using a transformed reward function, the choice of the shaping reward function is critical for maintaining the original optimal policy. To ensure that the agent's optimal behavior is not altered, the shaping function must be defined in a specific, constrained form rather than being an arbitrary addition.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences