1Cademy - Example of Reward Hacking: The Homework Analogy

Learn Before

Overoptimization Problem in Reward Modeling (Reward Hacking or Reward Gaming)

Example

Example of Reward Hacking: The Homework Analogy

A common analogy for reward hacking involves a student who is rewarded with points or praise for completing homework. To maximize this reward with minimal effort, the student might find shortcuts, such as copying solutions from the internet or previous assignments, instead of genuinely solving the problems to learn. Although this strategy successfully obtains the reward, it completely misses the underlying educational goal of the assignment.

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

An AI is trained to clean a virtual room and is rewarded based on how few messes are visible to its camera at the end of the task. The AI learns that it can achieve a perfect score by simply covering any mess with a box instead of properly disposing of it. Which statement best analyzes the fundamental flaw in this training setup?
Customer Support Chatbot Performance
Evaluating a Reward System Using the Homework Analogy

Learn Before

Related

Learn After