1Cademy - An agent is at time step `t`. It must choose between two actions, Action A and Action B. If it chooses Action A, the sequence of rewards it will receive from time step `t` until the end of the episode is `[+1, +1, +10]`. If it chooses Action B, the sequence of rewards it will receive is `[+5, -2, +5]`. To maximize its total accumulated reward from this point forward, which action should the agent choose and why?

Learn Before

Cumulative Future Reward (Return)

Multiple Choice

An agent is at time step t. It must choose between two actions, Action A and Action B. If it chooses Action A, the sequence of rewards it will receive from time step t until the end of the episode is [+1, +1, +10]. If it chooses Action B, the sequence of rewards it will receive is [+5, -2, +5]. To maximize its total accumulated reward from this point forward, which action should the agent choose and why?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related