Concept

Performance of TRPO with reward shaping (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

In order to make the agent more independent from the student state LSTM was used. Three different datasets have been used to train LSTM. As it was observed, the performance of DRL agent improved as the quality of the dataset improved and the agent with LSTMs learned faster and smoother than the agent with average sum of outcomes reward function.

0

1

Updated 2020-11-01

Tags

Data Science