1Cademy - Goodharts Law in Reward Modeling

Learn Before

Application of Segment-Based Total Reward in Policy Training

Theory

Goodhart's Law in Reward Modeling

Goodhart's Law provides a theoretical explanation for the overoptimization problem. The law states that when a measure, such as a reward score, is elevated to become an optimization target, it ceases to be a reliable indicator of the quality it was intended to represent.

Updated 2026-05-03

Contributors are: