Learn Before
Multiple Choice

An autonomous agent completes a task over four time steps. The sequence of actions and resulting rewards is as follows:

  • Time t=1: Action a_1 -> Reward r_1 = 0
  • Time t=2: Action a_2 -> Reward r_2 = 0
  • Time t=3: Action a_3 -> Reward r_3 = -1
  • Time t=4: Action a_4 -> Reward r_4 = +10

When evaluating the decision to take action a_2 at time t=2, which rewards should be considered as being potentially influenced by this specific action?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science