1Cademy - An agent is being trained in an environment where it must choose between two initial actions from the same starting position. Action A leads to a short sequence of steps resulting in a small, immediate reward. Action B leads to a much longer sequence of steps resulting in a large, delayed reward. According to the action-value function formula, which calculates the expected total discounted reward for taking an action in a state, how would decreasing the discount factor (γ) from a high value (e.g., 0.99) to a very low value (e.g., 0.1) most likely influence the agents learned behavior?

Learn Before

Action-Value Function Formula

Multiple Choice

An agent is being trained in an environment where it must choose between two initial actions from the same starting position. Action A leads to a short sequence of steps resulting in a small, immediate reward. Action B leads to a much longer sequence of steps resulting in a large, delayed reward. According to the action-value function formula, which calculates the expected total discounted reward for taking an action in a state, how would decreasing the discount factor (γ) from a high value (e.g., 0.99) to a very low value (e.g., 0.1) most likely influence the agent's learned behavior?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related