1Cademy - Simplification of the Trajectory Log-Probability Gradient

Learn Before

Decomposition of the Trajectory Log-Probability Gradient

Formula

Simplification of the Trajectory Log-Probability Gradient

After decomposing the trajectory log-probability gradient, it is typical in reinforcement learning settings to assume the environment's dynamics are not directly influenced by the policy parameters $\theta$ . Consequently, the derivative of the dynamics gradient, $\frac{\partial}{\partial \theta} \sum_{t=1}^{T} \log \Pr(s_{t+1}|s_t,a_t)$ , is usually zero. We can therefore simplify the overall gradient to focus entirely on optimizing the policy component: $\frac{\partial \log \mathrm{Pr}_{\theta}(\tau)}{\partial \theta} = \frac{\partial}{\partial \theta} \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t)$ This simplification allows for concentrating solely on policy updates without the need to understand or model the underlying environmental dynamics.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

References

Learn Before

Related

Learn After