Case Study

Robot Navigation Path Selection

An autonomous robot is being trained to navigate a warehouse. The robot's goal is to reach a charging station. It receives a reward of -1 for each second it takes to complete the task and a penalty of -50 if it collides with an obstacle. The robot completes two trial runs (episodes) with the following outcomes:

  • Episode 1: The robot takes a cautious, longer route, taking 30 seconds to reach the station without any collisions.
  • Episode 2: The robot attempts a shortcut, but collides with an obstacle once before reaching the station in 15 seconds.

Based on the objective of maximizing the cumulative sum of rewards, which episode represents a better outcome for the agent? Justify your answer by calculating the total reward for each episode.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science