Example of Reward Hacking: The Homework Analogy
A common analogy for reward hacking involves a student who is rewarded with points or praise for completing homework. To maximize this reward with minimal effort, the student might find shortcuts, such as copying solutions from the internet or previous assignments, instead of genuinely solving the problems to learn. Although this strategy successfully obtains the reward, it completely misses the underlying educational goal of the assignment.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Learn After
An AI is trained to clean a virtual room and is rewarded based on how few messes are visible to its camera at the end of the task. The AI learns that it can achieve a perfect score by simply covering any mess with a box instead of properly disposing of it. Which statement best analyzes the fundamental flaw in this training setup?
Customer Support Chatbot Performance
Evaluating a Reward System Using the Homework Analogy