1Cademy - Calculating Advantage Estimate

Learn Before

Advantage Estimation for A2C with a Reward Model

Short Answer

Calculating Advantage Estimate

An agent in a reinforcement learning system takes an action in a given state, resulting in a transition to a new state. A value network provides the following estimates for the states: the value of the current state is 2.5, and the value of the next state is 3.0. A separate reward model provides an immediate reward of 0.5 for this transition. Assuming a discount factor of 0.9, calculate the one-step temporal difference error used as an estimate for the advantage of the action taken. Show your calculation.

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related