1Cademy - Overoptimization Problem in Reward Modeling

Learn Before

Application of Segment-Based Total Reward in Policy Training

Problem

Overoptimization Problem in Reward Modeling

The overoptimization problem occurs when excessively aligning a large language model with an imperfect reward model leads to a decline in the model's true performance. This happens because the LLM learns to exploit flaws in the proxy measure rather than improving its ability to perform the actual desired task.

Updated 2025-10-07

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related