Concept

Analysis (Accelerating Human Learning With Deep Reinforcement Learning)

  1. TRPO performs better than the other baselines used on GPL.
  2. TRPO sometimes outperforms the other baselines on EFC & HLR and vise-versa.
  3. TRPO performs worse that threshold policy (this was described as unsurprising as threshold has an access to the latent parameters).

The authors claimed that the fact that TRPO sometimes performs worse that EFL and HLR is interesting. Thy state that additional hyperparameters and policies can increase TRPO performance.

0

1

Updated 2020-11-07

Tags

Data Science