Google

The Advantage Function, $$A(s_t, a_t)$$, measures the relative benefit of taking a specific action $$a_t$$ in a state $$s_t$$ compared to the expected value from that state onward. It is calculated by subtracting the state-value function, $$V(s_t)$$, which acts as a baseline ($$b$$), from the sum of future rewards. The formula is: $$A(s_t, a_t) = \sum_{k=t}^{T} r_k - V(s_t)$$

Advantage Function Formula

Given the scenario below, calculate the advantage of the agent's chosen action and explain what the resulting value signifies about the performance of that specific action.

Evaluating an Action's Performance

An agent in a given state `s` takes an action `a`. The sequence of rewards it receives from that point until the end of the episode sums to a total of 10. The pre-calculated value for state `s`, representing the average expected sum of future rewards from that state, is 15. Based on this information, what can be concluded about the action `a`?

An agent is in a specific situation where the average expected sum of future rewards is +2. The agent tries two different actions on separate occasions. After taking Action 1, the actual sum of future rewards it receives is +5. After taking Action 2, the actual sum of future rewards is -10. For which action is the 'advantage' value higher, and what does this value signify about that action's quality relative to the average expectation from that situation?

Learn Before

Related