1Cademy - An agent is being trained using value-based reward shaping. In a particular transition from state `s_t` to `s_{t+1}`, the agent receives an environmental reward `r` of 0. The agents current value function estimates that the value of the next state, `V(s_{t+1})`, is substantially higher than the value of the current state, `V(s_t)`. Based on the formula `r = r + γV(s_{t+1}) - V(s_t)`, what is the most likely consequence of this shaping on the agents learning for this specific transition?

Learn Before

Value-Based Reward Shaping Formula

Multiple Choice

An agent is being trained using value-based reward shaping. In a particular transition from state s_t to s_{t+1}, the agent receives an environmental reward r of 0. The agent's current value function estimates that the value of the next state, V(s_{t+1}), is substantially higher than the value of the current state, V(s_t). Based on the formula r' = r + γV(s_{t+1}) - V(s_t), what is the most likely consequence of this shaping on the agent's learning for this specific transition?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related