Concept

Learn the Model in Model-Based Methods for Deep RL

In model-based reinforcement learning, the model may be known or learned. In the latter case, we run a base policy, like a random or any educated policy, and observe the trajectory.

  1. run base policy π0(st,at)\pi_0(s_t, a_t) to collect D={(s,a,s)i}D = \{ (s,a,s')_i \}
  2. learn dynamics model f(s,a)f(s,a) to minimize if(si,ai)si2\sum_i ||f(s_i,a_i) - s_i' ||^2
  3. backpropagate through f(s,a)f(s,a) into the policy to optimize πθ(st,at)\pi_{\theta} (s_t,a_t)
  4. run πθ(st,at)\pi_{\theta} (s_t,a_t) add the resulting data {(x,u,x)j}\{ (x,u,x')_j \} to DD
  5. repeat from step 2

In step 2 above, we use supervised learning to train a model to minimize the least square error from the sampled trajectory

In step 3, we can use the model to predict the next state given an action, then we use the policy to decide the next action, and use the state and action to computer the cost. Finally, we backpropagate the cost to train the policy.

We continue sample and fit the model as we move along the path.

0

1

Updated 2020-10-17

Tags

Data Science