Concept

Off-Policy Value Target γ\gamma

In the context of model-based reinforcement learning, utilizing an off-policy value target (γ\gamma) generally provides faster initial convergence speeds across various environments. However, experimental comparisons demonstrate that relying solely on this off-policy target is often insufficient; it must be combined with an on-policy value target to maintain stable learning and achieve optimal long-term performance.

0

1

Updated 2026-05-17

Tags

Data Science