1Cademy - State-Value Function (V) Formula

Learn Before

State-Value and Action-Value Functions

Formula

State-Value Function (V) Formula

The state-value function, denoted as $V(s)$ , quantifies the expected discounted return (the sum of accumulated rewards) an agent will receive if it starts in a specific state $s$ and strictly follows a given policy $\pi$ thereafter. Mathematically, it is expressed as the expectation over all possible state-action trajectories:

$V(s) = \mathbb{E} \Big[ \sum_{t=0}^{\infty} \gamma^{t} r_t \ \big | \ s_0 = s, \pi \Big]$

This can also be expanded to explicitly show the individually discounted future rewards:

$V(s) = \mathbb{E} \Big[ r(s_0,a_0,s_1) + \gamma r(s_1,a_1,s_2) + \gamma^2 r(s_2,a_2,s_3) + \cdots \ \big | \ s_0 = s, \pi \Big]$

$V(s) = \mathbb{E} \Big[ r_0 + \gamma r_1 + \gamma^2 r_2 + \cdots \ \big | \ s_0 = s, \pi \Big]$

In this formula, $\gamma$ ( ${}0 \le \gamma \le 1$ ) is the discount factor that controls the weight of future rewards, $s_0 = s$ specifies the initial starting state, and $r_t$ is the reward at time step $t$ .

0

1

Updated 2026-05-01

Contributors are:

Who are from:

References

Learn Before

Related

Learn After