1Cademy - Comparing Action Quality

Learn Before

Advantage Function Formula

Short Answer

Comparing Action Quality

An agent is in a specific situation where the average expected sum of future rewards is +2. The agent tries two different actions on separate occasions. After taking Action 1, the actual sum of future rewards it receives is +5. After taking Action 2, the actual sum of future rewards is -10. For which action is the 'advantage' value higher, and what does this value signify about that action's quality relative to the average expectation from that situation?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related