1Cademy - Cumulative Reward of a Trajectory

Learn Before

Notational Variations in State-Action Sequences (Trajectories)

Formula

Cumulative Reward of a Trajectory

The cumulative reward, also referred to as the return, for a specific state-action sequence (or trajectory) is determined by adding together all the individual rewards accumulated during that sequence. If we denote a trajectory consisting of $T$ time steps as $\tau = \{(s_1,a_1),...,(s_T,a_T)\}$ , the total cumulative reward $R(\tau)$ is formally calculated as the sum of the rewards $r_t$ from the first step to the final step: