Multiple Choice

An agent interacts with an environment over an episode that lasts for 5 time steps (from time step 0 to 4). The sequence of rewards received by the agent is: -1, 0, 5, -2, 10. What is the value of the future return, represented by the notation k=tTrk\sum_{k=t}^{T} r_k, if the current time step t is 2 and the final time step T is 4?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science