Learn Before
An autonomous agent is being trained to navigate a maze. The agent's decision-making process at any given intersection (a 'state') is determined by a specific component of its programming. Which of the following scenarios best exemplifies this decision-making component?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reference Policy ()
Policy Probability Ratio (Ratio Function)
An autonomous agent is being trained to navigate a maze. The agent's decision-making process at any given intersection (a 'state') is determined by a specific component of its programming. Which of the following scenarios best exemplifies this decision-making component?
An autonomous agent is programmed to navigate a grid. When it reaches a specific grid cell (state 'S'), it must choose an action. Consider two different versions of the agent's programming:
- Agent 1: When in state 'S', it is programmed to always choose the action 'move North'.
- Agent 2: When in state 'S', it is programmed to choose 'move North' with 70% probability and 'move East' with 30% probability.
Which statement best analyzes the difference in how these two agents map states to actions?
An agent's goal is to navigate a simple environment and maximize its total reward. The agent is currently in a state 'S'. From this state, it can take one of two actions: 'Action 1' which consistently leads to a reward of +10, or 'Action 2' which consistently leads to a reward of -5. Consider two possible behavior patterns for the agent when it is in state 'S':
- Behavior A: The agent chooses 'Action 1' with a 100% probability.
- Behavior B: The agent chooses 'Action 1' with a 50% probability and 'Action 2' with a 50% probability.
Which behavior pattern is superior for achieving the agent's goal, and why?