1Cademy - Robot Navigation Path Selection

Learn Before

Total Reward (Return)

Case Study

Robot Navigation Path Selection

An autonomous robot is being trained to navigate a warehouse. The robot's goal is to reach a charging station. It receives a reward of -1 for each second it takes to complete the task and a penalty of -50 if it collides with an obstacle. The robot completes two trial runs (episodes) with the following outcomes:

Episode 1: The robot takes a cautious, longer route, taking 30 seconds to reach the station without any collisions.
Episode 2: The robot attempts a shortcut, but collides with an obstacle once before reaching the station in 15 seconds.

Based on the objective of maximizing the cumulative sum of rewards, which episode represents a better outcome for the agent? Justify your answer by calculating the total reward for each episode.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related