Learn Before
Sum of Future Rewards Notation
The notation represents the sum of future rewards, also known as the future return. It is the total reward collected from the current time step to the end of an episode at time step .

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
State in the Context of LLMs
An autonomous agent is designed to navigate a maze to find a piece of cheese. At any given moment, the agent knows its current coordinates (e.g., row 3, column 5), whether the adjacent squares contain walls or open paths, and the location of the cheese. Based on this information, the agent must decide whether to move up, down, left, or right. Which of the following best describes the agent's 'state' in this scenario?
Defining the State for a Chess-Playing Agent
Designing a State Representation for a Self-Driving Car
Sum of Future Rewards Notation
Learn After
An agent interacts with an environment over an episode that lasts for 5 time steps (from time step 0 to 4). The sequence of rewards received by the agent is: -1, 0, 5, -2, 10. What is the value of the future return, represented by the notation , if the current time step
tis 2 and the final time stepTis 4?Consider an agent interacting with an environment over a single episode. The future return is calculated as the sum of all rewards from a specific time step
tto the final time stepT, represented by the notation . True or False: For any two consecutive time stepstandt+1within the episode, the future return calculated fromtwill be greater than the future return calculated fromt+1if and only if the immediate reward received at time stept, denoted as , is positive.Calculating Future Return