1Cademy - An agent interacts with an environment over five time steps and receives the following sequence of rewards, starting from time step 1: `[-1, +3, +10, -5, +2]`. What is the cumulative future reward (also known as the return) calculated from time step 3?

Learn Before

Cumulative Future Reward (Return)

Multiple Choice

An agent interacts with an environment over five time steps and receives the following sequence of rewards, starting from time step 1: [-1, +3, +10, -5, +2]. What is the cumulative future reward (also known as the return) calculated from time step 3?

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences