1Cademy - Goal of Reinforcement Learning

Learn Before

Cumulative Reward of a Trajectory

Concept

Goal of Reinforcement Learning

The primary objective in reinforcement learning is to develop a policy that enables an agent to maximize the total cumulative reward, also known as the return, that it accumulates over an extended period of interaction with its environment.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Objective Function as Expected Cumulative Reward (Performance Function)
An agent is being trained to find the best route through a system. It is presented with two options:
- Route 1: Provides a consistent, small positive reward at every step, resulting in a total reward of +15 for the entire route.
- Route 2: Starts with a step that gives a negative reward (a penalty) of -5, but subsequent steps lead to very high rewards, resulting in a total reward of +50 for the entire route.
An agent that has been successfully trained according to the primary objective of its learning framework will learn to choose Route 2. Which of the following statements best explains why?
Analysis of a Suboptimal Agent Policy
An agent is learning to play a game where the objective is to get the highest possible final score. At a critical decision point, the agent chooses an action that yields an immediate reward of 0, passing up an alternative action that would have given an immediate reward of +10. This decision is necessarily an indication that the agent's policy is flawed and not aligned with the primary goal of its learning framework.

Learn Before

Related

Learn After