1Cademy - An agent is being trained using an algorithm where the value networks performance is measured by the mean squared difference between its predicted value for a state, `V(s_t)`, and a computed target value, `r_t + γ * V(s_{t+1})`. During a particular training batch, the network consistently produces predictions `V(s_t)` that are significantly lower than the computed target values. What is the most direct effect on the networks parameters during the subsequent optimization step?

Learn Before

Value Network Loss Function in A2C

Multiple Choice

An agent is being trained using an algorithm where the value network's performance is measured by the mean squared difference between its predicted value for a state, V(s_t), and a computed target value, r_t + γ * V(s_{t+1}). During a particular training batch, the network consistently produces predictions V(s_t) that are significantly lower than the computed target values. What is the most direct effect on the network's parameters during the subsequent optimization step?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related