Learn Before
Agent Performance Calculation
An agent's interaction with an environment is recorded as a sequence of steps, where a numerical reward is assigned after each action. Given the following interaction log for a single episode, calculate the total reward accumulated by the agent from the beginning to the end of the sequence.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Goal of Reinforcement Learning
Agent Performance Calculation
An agent interacts with an environment over a sequence of four time steps. The rewards it receives at each step are as follows: rā = +3, rā = -1, rā = +5, rā = -2. What is the total cumulative reward for this entire sequence?
Consider an agent that completes a five-step sequence of actions, receiving the following rewards at each step: [-5, +1, +1, +1, 0]. This sequence is preferable to another sequence that consists of a single step with a reward of -1.