1Cademy - Experimental Setup (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

Concept

Experimental Setup (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

Number of items : 30
Number of runs: 10
Number of episodes per run: 100
Number of steps per episode: 200
Delay between steps: 5s
Four baseline policies: Leitner, SuperMemo, Random, Threshold
EFC $\theta$ sampled from a log-normal distribution $\log\theta \sim N \log (0.077, 1)
HLR $\overrightarrow {\rm \theta} = (1, 1, 0, \theta_3 \sim N(0, 1))$ $x_i$ = (num attempts, num correct, num incorrect, one-hot encoding of item i out of n items).
GPL student ability $a = \overrightarrow {\rm \alpha} = 0$ sample item difficulties $d \sim N(1, 1)$ and $\log \overrightarrow {\rm d} \sim N(1, 1)$ delay coefficient l $\log r \sim N(0, 0.001)$ , window coefficients $\theta_{2w} = \theta_{2w} - 1 = \frac {1} {\sqrt W - w + 1}$ and number of windows 5.
TRPO and TNPG settings were the same
LSTM: 20 units, dense unit with sigmoid, Adam optimizer. 2 hidden states.
DRL

0

1

Updated 2020-10-29

Contributors are:

Who are from: