Evaluating an Agent's Action Choice
Based on the scenario below, calculate the one-step advantage estimate for the action taken. Then, explain what the sign of this estimate implies about both the action itself and the agent's original valuation of its starting state.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autonomous agent is navigating a maze. At a particular state, the agent's value function estimates the value of its current state to be 10. The agent decides to move to an adjacent state, receiving an immediate reward of -1 for the move. The value function estimates the value of the new state to be 15. Assuming a discount factor of 0.9, calculate the one-step advantage estimate for the action taken and determine its implication for future action selection.
Derivation of the Advantage Function Estimator
Advantage Function as a Form of Shaped Reward
Evaluating an Agent's Action Choice