1Cademy - An agent is being trained to find the best route through a system. It is presented with two options: * **Route 1:** Provides a consistent, small positive reward at every step, resulting in a total reward of +15 for the entire route. * **Route 2:** Starts with a step that gives a negative reward (a penalty) of -5, but subsequent steps lead to very high rewards, resulting in a total reward of +50 for the entire route. An agent that has been successfully trained according to the primary objective of its learning framework will learn to choose Route 2. Which of the following statements best explains why?

Learn Before

Goal of Reinforcement Learning

Multiple Choice

An agent is being trained to find the best route through a system. It is presented with two options:

Route 1: Provides a consistent, small positive reward at every step, resulting in a total reward of +15 for the entire route.
Route 2: Starts with a step that gives a negative reward (a penalty) of -5, but subsequent steps lead to very high rewards, resulting in a total reward of +50 for the entire route.

An agent that has been successfully trained according to the primary objective of its learning framework will learn to choose Route 2. Which of the following statements best explains why?

Updated 2025-09-29

Contributors are: