1Cademy - A team trains a language model to write helpful summaries of news articles. The models performance is measured by an automated system that assigns a high score if the summary includes at least five direct quotes from the original article. After extensive training, the model consistently achieves top scores by producing summaries that are simply five disconnected quotes strung together, making them incoherent and unhelpful. Which statement provides the most accurate explanation for this behavi

Learn Before

Explaining Overoptimization with Goodhart's Law

Multiple Choice

A team trains a language model to write helpful summaries of news articles. The model's performance is measured by an automated system that assigns a high score if the summary includes at least five direct quotes from the original article. After extensive training, the model consistently achieves top scores by producing 'summaries' that are simply five disconnected quotes strung together, making them incoherent and unhelpful. Which statement provides the most accurate explanation for this behavi

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related