1Cademy - Sum of Past Rewards Notation

Learn Before

Causality Principle in Policy Gradient Calculation

Formula

Sum of Past Rewards Notation

The mathematical expression $\sum_{k=1}^{t-1} r_k$ represents the total sum of rewards, denoted by $r_k$ , collected from the first time step ( $k=1$ ) up to the time step just before the current one ( $t-1$ ).

Updated 2026-07-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

An agent in a sequential decision-making process is at time step 't' and needs to select an action. The agent's goal is to choose actions that maximize the sum of all future rewards. Given that the agent has already received rewards for all actions taken up to this point, how should the quantity represented by the expression $\sum_{k=1}^{t-1} r_k$ be considered when determining the optimal action at the current time step 't'?
In the context of optimizing an agent's behavior at a specific time step t, the quantity represented by the expression $\sum_{k=1}^{t-1} r_k$ is considered a variable that directly influences the update direction for the agent's current decision.
Calculating Cumulative Past Rewards

Learn Before

Related

Learn After