Problem

Overoptimization Problem in Reward Modeling

The overoptimization problem occurs when excessively aligning a large language model with an imperfect reward model leads to a decline in the model's true performance. This happens because the LLM learns to exploit flaws in the proxy measure rather than improving its ability to perform the actual desired task.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences