Learn Before
Designing a Grid World Reward Function
Consider an agent navigating a 10x10 grid. The agent's goal is to move from a starting square to a designated goal square. Certain squares are marked as 'hazards'. The agent receives a reward after each move. Your task is to design a reward function that encourages the agent to reach the goal efficiently while avoiding the hazards. Describe the reward values you would assign for the following three outcomes:
- Reaching the goal square.
- Entering a hazard square.
- Making any other move (i.e., moving to a normal, non-goal, non-hazard square).
Provide a specific numerical value for each outcome and briefly justify your choices.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Flawed Agent Behavior
A reinforcement learning agent controls a robot vacuum cleaner. The primary goal is for the robot to collect all pieces of trash in a room as quickly as possible. Which of the following reward function designs would be most effective at encouraging the desired behavior without leading to unintended negative consequences?
Designing a Grid World Reward Function