Learn Before
An agent interacts with an environment over a sequence of four time steps. The rewards it receives at each step are as follows: r₁ = +3, r₂ = -1, r₃ = +5, r₄ = -2. What is the total cumulative reward for this entire sequence?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Goal of Reinforcement Learning
Agent Performance Calculation
An agent interacts with an environment over a sequence of four time steps. The rewards it receives at each step are as follows: r₁ = +3, r₂ = -1, r₃ = +5, r₄ = -2. What is the total cumulative reward for this entire sequence?
Consider an agent that completes a five-step sequence of actions, receiving the following rewards at each step: [-5, +1, +1, +1, 0]. This sequence is preferable to another sequence that consists of a single step with a reward of -1.