Learn Before
An autonomous agent is being trained to navigate a grid. From its current position, it can choose one of two paths. Path A leads to an immediate reward of +10. Path B involves several steps with no immediate reward, but ultimately leads to a reward of +100. Two separate agents are trained for this task: Agent 1 uses a discount factor of 0.1, and Agent 2 uses a discount factor of 0.9. Based on these settings, which outcome is most likely?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autonomous agent is being trained to navigate a grid. From its current position, it can choose one of two paths. Path A leads to an immediate reward of +10. Path B involves several steps with no immediate reward, but ultimately leads to a reward of +100. Two separate agents are trained for this task: Agent 1 uses a discount factor of 0.1, and Agent 2 uses a discount factor of 0.9. Based on these settings, which outcome is most likely?
Calculating Discounted Return
Selecting an Appropriate Discount Factor