Short Answer

Designing a Grid World Reward Function

Consider an agent navigating a 10x10 grid. The agent's goal is to move from a starting square to a designated goal square. Certain squares are marked as 'hazards'. The agent receives a reward after each move. Your task is to design a reward function that encourages the agent to reach the goal efficiently while avoiding the hazards. Describe the reward values you would assign for the following three outcomes:

  1. Reaching the goal square.
  2. Entering a hazard square.
  3. Making any other move (i.e., moving to a normal, non-goal, non-hazard square).

Provide a specific numerical value for each outcome and briefly justify your choices.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science