1Cademy - An agents goal is to navigate a simple environment and maximize its total reward. The agent is currently in a state S. From this state, it can take one of two actions: Action 1 which consistently leads to a reward of +10, or Action 2 which consistently leads to a reward of -5. Consider two possible behavior patterns for the agent when it is in state S:<br><br>* **Behavior A:** The agent chooses Action 1 with a 100% probability.<br>* **Behavior B:** The agent chooses Action 1 with a 50% p

Learn Before

Policy in Reinforcement Learning ( $\pi$ )

Multiple Choice

An agent's goal is to navigate a simple environment and maximize its total reward. The agent is currently in a state 'S'. From this state, it can take one of two actions: 'Action 1' which consistently leads to a reward of +10, or 'Action 2' which consistently leads to a reward of -5. Consider two possible behavior patterns for the agent when it is in state 'S':

Behavior A: The agent chooses 'Action 1' with a 100% probability.
Behavior B: The agent chooses 'Action 1' with a 50% p

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related