1Cademy - Evaluating an Agents Action Choice

Learn Before

Temporal Difference (TD) Error as an Advantage Function Estimator

Case Study

Evaluating an Agent's Action Choice

Based on the scenario below, calculate the one-step advantage estimate for the action taken. Then, explain what the sign of this estimate implies about both the action itself and the agent's original valuation of its starting state.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

An autonomous agent is navigating a maze. At a particular state, the agent's value function estimates the value of its current state to be 10. The agent decides to move to an adjacent state, receiving an immediate reward of -1 for the move. The value function estimates the value of the new state to be 15. Assuming a discount factor of 0.9, calculate the one-step advantage estimate for the action taken and determine its implication for future action selection.
Derivation of the Advantage Function Estimator
Evaluating an Agent's Action Choice
Advantage Function as a Form of Shaped Reward

Learn Before

Related