Performance of TRPO with reward shaping (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
In order to make the agent more independent from the student state LSTM was used. Three different datasets have been used to train LSTM. As it was observed, the performance of DRL agent improved as the quality of the dataset improved and the agent with LSTMs learned faster and smoother than the agent with average sum of outcomes reward function.
0
1
Tags
Data Science
Related
Relation between rewards and thresholds (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Performance of DRL agent when the number of items are varied (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Comparison of Performance of TRPO and TNPG algorithms (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Comparison between likelihood and average of sum of outcomes based reward functions (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Performance of TRPO with reward shaping (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)