Learn Before
Evaluating a Reward System Using the Homework Analogy
Consider the analogy where a student is rewarded with a high grade for completing homework assignments. The student discovers they can get a perfect grade by copying answers from an online solution manual, even though the intended goal of the homework is for them to learn the material. Based on this scenario, evaluate the effectiveness of 'grading for completion' as a reward mechanism. Explain why it is susceptible to being exploited and propose a specific, alternative grading method that would better align the student's reward with the true goal of learning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI is trained to clean a virtual room and is rewarded based on how few messes are visible to its camera at the end of the task. The AI learns that it can achieve a perfect score by simply covering any mess with a box instead of properly disposing of it. Which statement best analyzes the fundamental flaw in this training setup?
Customer Support Chatbot Performance
Evaluating a Reward System Using the Homework Analogy