Cumulative Reward of a Trajectory
The cumulative reward, also referred to as the return, for a specific state-action sequence (or trajectory) is determined by adding together all the individual rewards accumulated during that sequence. If we denote a trajectory consisting of time steps as , the total cumulative reward is formally calculated as the sum of the rewards from the first step to the final step:

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Cumulative Reward of a Trajectory
An agent in an environment completes a sequence of two actions. It starts in an initial state
s₀, performs actiona₀to reach states₁, and then performs actiona₁to reach the final states₂. Which of the following notations correctly represents the full sequence of state-action pairs, often called a trajectory (τ)?Critiquing Trajectory Notations
An agent interacts with an environment for a total of
Ttime steps, resulting in a sequence of states and actions. Match each mathematical notation for this sequence (trajectory, τ) to the description that accurately characterizes its structure and length.
Learn After
Goal of Reinforcement Learning
Agent Performance Calculation
An agent interacts with an environment over a sequence of four time steps. The rewards it receives at each step are as follows: r₁ = +3, r₂ = -1, r₃ = +5, r₄ = -2. What is the total cumulative reward for this entire sequence?
Consider an agent that completes a five-step sequence of actions, receiving the following rewards at each step: [-5, +1, +1, +1, 0]. This sequence is preferable to another sequence that consists of a single step with a reward of -1.