Concept
Off-Policy Value Target
In the context of model-based reinforcement learning, utilizing an off-policy value target () generally provides faster initial convergence speeds across various environments. However, experimental comparisons demonstrate that relying solely on this off-policy target is often insufficient; it must be combined with an on-policy value target to maintain stable learning and achieve optimal long-term performance.
0
1
Updated 2026-05-17
Contributors are:
Who are from:
Tags
Data Science