Learn Before
Concept

Off and On-Policy in MuZero

Since MuZero uses a replay buffer, one can consider it off-policy. However, viewed from the perspective of the behavior and training policies, it could also be considered on-policy.

0

1

Updated 2021-08-19

Tags

Data Science