Learn Before
Multiple Choice

An agent is being trained to find the best route through a system. It is presented with two options:

  • Route 1: Provides a consistent, small positive reward at every step, resulting in a total reward of +15 for the entire route.
  • Route 2: Starts with a step that gives a negative reward (a penalty) of -5, but subsequent steps lead to very high rewards, resulting in a total reward of +50 for the entire route.

An agent that has been successfully trained according to the primary objective of its learning framework will learn to choose Route 2. Which of the following statements best explains why?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science