1Cademy - Explaining Overoptimization with Goodharts Law

Learn Before

Overoptimization Problem in Reward Modeling (Reward Hacking or Reward Gaming)

Concept

Explaining Overoptimization with Goodhart's Law

The overoptimization problem in reward modeling can be understood as a practical example of Goodhart's Law. The law states that a measure used as a target ceases to be effective. In this case, the reward score is the measure, and the LLM's optimization process makes it the target. By focusing solely on maximizing the score, the LLM's behavior can diverge from the intended goal, rendering the reward score an unreliable indicator of true performance.

Updated 2026-05-03

Contributors are: