1Cademy - An agent is learning to play a game where the objective is to get the highest possible final score. At a critical decision point, the agent chooses an action that yields an immediate reward of 0, passing up an alternative action that would have given an immediate reward of +10. This decision is necessarily an indication that the agents policy is flawed and not aligned with the primary goal of its learning framework.

Route 1: Provides a consistent, small positive reward at every step, resulting in a total reward of +15 for the entire route.
Route 2: Starts with a step that gives a negative reward (a penalty) of -5, but subsequent steps lead to very high rewards, resulting in a total reward of +50 for the entire route.

Learn Before

Goal of Reinforcement Learning

True/False

An agent is learning to play a game where the objective is to get the highest possible final score. At a critical decision point, the agent chooses an action that yields an immediate reward of 0, passing up an alternative action that would have given an immediate reward of +10. This decision is necessarily an indication that the agent's policy is flawed and not aligned with the primary goal of its learning framework.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related