Derivation of the Advantage Function Estimator
Explain why the expression r_t + γV(s_{t+1}) - V(s_t) is considered a valid single-sample estimate for the advantage of taking action a_t in state s_t. Your explanation should break down the expression and relate its components to the formal definitions of the action-value and state-value functions.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autonomous agent is navigating a maze. At a particular state, the agent's value function estimates the value of its current state to be 10. The agent decides to move to an adjacent state, receiving an immediate reward of -1 for the move. The value function estimates the value of the new state to be 15. Assuming a discount factor of 0.9, calculate the one-step advantage estimate for the action taken and determine its implication for future action selection.
Derivation of the Advantage Function Estimator
Advantage Function as a Form of Shaped Reward
Evaluating an Agent's Action Choice