Learn Before
Selecting an Appropriate Discount Factor
For each agent described in the scenarios below, determine whether a 'myopic' evaluation (using a discount factor close to 0) or a 'far-sighted' evaluation (using a discount factor close to 1) would be more suitable for its training. Justify your reasoning for each choice.
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autonomous agent is being trained to navigate a grid. From its current position, it can choose one of two paths. Path A leads to an immediate reward of +10. Path B involves several steps with no immediate reward, but ultimately leads to a reward of +100. Two separate agents are trained for this task: Agent 1 uses a discount factor of 0.1, and Agent 2 uses a discount factor of 0.9. Based on these settings, which outcome is most likely?
Calculating Discounted Return
Selecting an Appropriate Discount Factor