1Cademy - In the context of estimating the advantage of taking an action `a_t` in a state `s_t`, the formula `A(s_t, a_t) = (∑_{k=t}^{T} r_k) - V(s_t)` is often used. What is the primary role of the reward-to-go term, `∑_{k=t}^{T} r

Learn Before

Advantage Function Estimation using Reward-to-Go

Multiple Choice

In the context of estimating the advantage of taking an action a_t in a state s_t, the formula A(s_t, a_t) = (∑_{k=t}^{T} r_k) - V(s_t) is often used. What is the primary role of the reward-to-go term, ∑_{k=t}^{T} r_k, within this specific estimation?