Concept

Algorithm (Accelerating Human Learning With Deep Reinforcement Learning)

The authors use trust region policy optimization with gated recurrent unit in order to solve partially-observable Markov decision processes. As an input policy has the identity of the previous item, binary outcome(for recall), the time elapsed. Then it gives an output for the next item. In order to scale to much larger amount of items, the authors used random projection trick.

0

1

Updated 2020-10-17

Tags

Data Science

Learn After