1Cademy - State-Value and Action-Value Functions

Learn Before

Math Behind Reinforcement Learning

Concept

State-Value and Action-Value Functions

In reinforcement learning, value functions are crucial for estimating the long-term desirability of states or actions. They quantify the expected return, which is the total accumulated reward an agent anticipates. The two main types are:

State-Value Function ( $v_\pi$ ): Also known as the value function, this assesses the expected discounted return (i.e., accumulated rewards) for an agent starting from a particular state 's' and following a specific policy 'π'. The expectation is performed over all possible trajectories originating from that state.
Action-Value Function ( $q_\pi$ ): Also known as the Q-value function, this measures the expected return if an agent begins in state 's', performs action 'a', and subsequently adheres to policy 'π'.