An autonomous agent is navigating a maze. At a particular state, the agent's value function estimates the value of its current state to be 10. The agent decides to move to an adjacent state, receiving an immediate reward of -1 for the move. The value function estimates the value of the new state to be 15. Assuming a discount factor of 0.9, calculate the one-step advantage estimate for the action taken and determine its implication for future action selection.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autonomous agent is navigating a maze. At a particular state, the agent's value function estimates the value of its current state to be 10. The agent decides to move to an adjacent state, receiving an immediate reward of -1 for the move. The value function estimates the value of the new state to be 15. Assuming a discount factor of 0.9, calculate the one-step advantage estimate for the action taken and determine its implication for future action selection.
Derivation of the Advantage Function Estimator
Advantage Function as a Form of Shaped Reward
Evaluating an Agent's Action Choice