Relation between rewards and thresholds (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
The researches analyzed which methods generated the maximum rewards. The threshold value was between 0 and 1 and they recorded the reward value. The authors observed how the reward function changed.
EFC student model had highest reward values when threshold was between 0.2 and 0.4 (likelihood based reward function). According this, student learning is most efficient when the probability that the students will answer correctly is between 0.2 and 0.4. When log likelihood based reward function was used, reward values decreased when the threshold increased.
In case of HLR student model, for both reward values their plots with respect to threshold looked like a step function. This means that the optimal reward values were when threshold was in range 0 and 0.5
In case of GPL (DASH) when likelihood reward function was used there was a lot of noise in reward values when the threshold increased. For the log likelihood function, the reward values decreased as the threshold increased which was similar to the tendency that we had in EFC student model.
0
1
Tags
Data Science
Related
Relation between rewards and thresholds (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Performance of DRL agent when the number of items are varied (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Comparison of Performance of TRPO and TNPG algorithms (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Comparison between likelihood and average of sum of outcomes based reward functions (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)
Performance of TRPO with reward shaping (research objective) (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)