Concept

Performance of TRPO with reward shaping (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

In order to measure performance of TRPO with reward shaping, number of episodes per run was set to 40 without any modifications in the other parameters and EFC environment was used. The LSTMs were trained in the following ways:

  1. Using data from random sample
  2. Using data from random policy tutor
  3. Using data from Supermemo tutor

0

1

Updated 2020-10-29

Tags

Data Science

Related