1Cademy - An agent is in a state S and follows a fixed policy. From this state, the environment is stochastic: there is a 50% chance the agent will enter a trajectory with a reward sequence of [+10, 0, 0, ...] and a 50% chance it will enter a different trajectory with a reward sequence of [0, +10, 0, ...]. Given the state-value formula $V(s) = \mathbb{E}[\sum_{t=0}^{\infty} \gamma^t r_t | s_0 = s, \pi]$ and a discount factor (γ) of 0.9, what is the value of state S?

Policy A Reward Sequence: [+10, +1, +1, +1, ...]
Policy B Reward Sequence: [+3, +3, +3, +3, ...]

Learn Before

State-Value Function (V) Formula

Multiple Choice

An agent is in a state 'S' and follows a fixed policy. From this state, the environment is stochastic: there is a 50% chance the agent will enter a trajectory with a reward sequence of [+10, 0, 0, ...] and a 50% chance it will enter a different trajectory with a reward sequence of [0, +10, 0, ...]. Given the state-value formula $V(s) = \mathbb{E}[\sum_{t=0}^{\infty} \gamma^t r_t | s_0 = s, \pi]$ and a discount factor (γ) of 0.9, what is the value of state 'S'?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related