Learn Before
Calculating Discounted Return
An autonomous agent receives the following sequence of rewards starting from the next time-step (t+1): +5, +2, +10. The episode ends after the third reward. If the agent uses a discount factor (γ) of 0.5, what is the total discounted return (G_t) from the current time-step (t)?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autonomous agent is being trained to navigate a grid. From its current position, it can choose one of two paths. Path A leads to an immediate reward of +10. Path B involves several steps with no immediate reward, but ultimately leads to a reward of +100. Two separate agents are trained for this task: Agent 1 uses a discount factor of 0.1, and Agent 2 uses a discount factor of 0.9. Based on these settings, which outcome is most likely?
Calculating Discounted Return
Selecting an Appropriate Discount Factor