1Cademy - An autonomous agent is being trained to navigate a grid. From its current position, it can choose one of two paths. Path A leads to an immediate reward of +10. Path B involves several steps with no immediate reward, but ultimately leads to a reward of +100. Two separate agents are trained for this task: Agent 1 uses a discount factor of 0.1, and Agent 2 uses a discount factor of 0.9. Based on these settings, which outcome is most likely?

Learn Before

Return

Multiple Choice

An autonomous agent is being trained to navigate a grid. From its current position, it can choose one of two paths. Path A leads to an immediate reward of +10. Path B involves several steps with no immediate reward, but ultimately leads to a reward of +100. Two separate agents are trained for this task: Agent 1 uses a discount factor of 0.1, and Agent 2 uses a discount factor of 0.9. Based on these settings, which outcome is most likely?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related