Concept
Learn the Model in Model-Based Methods for Deep RL
In model-based reinforcement learning, the model may be known or learned. In the latter case, we run a base policy, like a random or any educated policy, and observe the trajectory.
- run base policy to collect
- learn dynamics model to minimize
- backpropagate through into the policy to optimize
- run add the resulting data to
- repeat from step 2
In step 2 above, we use supervised learning to train a model to minimize the least square error from the sampled trajectory
In step 3, we can use the model to predict the next state given an action, then we use the policy to decide the next action, and use the state and action to computer the cost. Finally, we backpropagate the cost to train the policy.
We continue sample and fit the model as we move along the path.
0
1
Updated 2020-10-17
Tags
Data Science