Concept

MuZero

MuZero is a model-based reinforcement learning method which contains a representation function, a dynamics function, and a prediction function.The representation function encodes an observation sequence into a hidden state. The dynamics function evaluates the current hidden state and a potential action to determine the potential next hidden state and a reward value. Lastly, the prediction function is the policy generation function. They use similar parameters and are optimized according to a combined loss function.

A Monte Carlo Tree Search (MCTS) is often used when training to evaluate action trees from the current/ root state.

0

1

Updated 2021-08-19

Tags

Data Science