1Cademy - Consider the approximated policy divergence penalty formula: $$ \text{Penalty} = \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) - \sum_{t=1}^{T} \log \pi_{\theta_{\text{ref}}}(a_t|s_t) $$ This penaltys value for a fixed trajectory of states and actions is sensitive to changes in the environments transition dynamics.

Learn Before

Approximated Policy Divergence Penalty Formula

True/False

Consider the approximated policy divergence penalty formula: $\text{Penalty} = \sum_{t=1}^{T} \log \pi_{\theta}(a_t|s_t) - \sum_{t=1}^{T} \log \pi_{\theta_{\text{ref}}}(a_t|s_t)$ This penalty's value for a fixed trajectory of states and actions is sensitive to changes in the environment's transition dynamics.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related