1Cademy - An agent completes an episode of 4 time steps, receiving the following sequence of rewards: `r_1 = -10`, `r_2 = +2`, `r_3 = +5`, `r_4 = -1`. When updating the agents decision-making process, what is the reward-to-go value that should be associated with the action taken at time step `t=2`?

Method 1: The score is the sum of all rewards in the sequence: r_1 + r_2 + ... + r_T .
Method 2: The score is the sum of rewards from time step t onward: r_t + r_{t+1} + ... + r_T .

Learn Before

Reward-to-Go

Multiple Choice

An agent completes an episode of 4 time steps, receiving the following sequence of rewards: r_1 = -10, r_2 = +2, r_3 = +5, r_4 = -1. When updating the agent's decision-making process, what is the 'reward-to-go' value that should be associated with the action taken at time step t=2?