1Cademy - Total Reward (Return)

Learn Before

High Variance in Policy Gradient Estimates

Formula

Total Reward (Return)

In reinforcement learning, the total reward, also known as the return, represents the cumulative sum of rewards an agent receives over a sequence of time steps, typically an episode. It is formally defined by the equation: $\sum_{t=1}^{T} r_t$ , where $r_t$ is the reward at time step $t$ , and $T$ is the final time step. Maximizing this cumulative reward is the primary objective for the agent.

Updated 2026-07-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

Baseline Method for Policy Gradient Variance Reduction
An agent is being trained in an environment where its sole objective is to maximize the sum of rewards it collects during an episode. The agent completes two separate episodes, receiving the following sequences of rewards:
- Episode A: [+2, +2, +2, +2, +2]
- Episode B: [-5, -5, +10, +10, +1]
Based on the agent's primary objective, which statement correctly compares the outcomes of these two episodes?
Robot Navigation Path Selection
Calculating Episode Return
Cumulative Future Reward (Return)

Learn Before

Related

Learn After