Concept

Experimental Setup (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

The experimental setup involved the following parameters and configurations:

  1. Number of items: 30
  2. Number of runs: 10
  3. Number of episodes per run: 100
  4. Number of steps per episode: 200
  5. Delay between steps: 5s
  6. Four baseline policies: Leitner, SuperMemo, Random, and Threshold
  7. EFC: hetaheta was sampled from a log-normal distribution where loghetaN(log(0.077),1)\log heta \sim \mathcal{N}(\log(0.077), 1).
  8. HLR: θ=(1,1,0,θ3N(0,1))\overrightarrow{\rm \theta} = (1, 1, 0, \theta_3 \sim \mathcal{N}(0, 1)), where xix_i represents (number of attempts, number of correct, number of incorrect, one-hot encoding of item ii out of nn items).
  9. GPL: Student ability a=α=0a = \overrightarrow{\alpha} = 0. Sampled item difficulties d sim mathcal{N}(1, 1) and logdN(1,1)\log \overrightarrow{\rm d} \sim \mathcal{N}(1, 1). The delay coefficient is logrN(0,0.001)\log r \sim \mathcal{N}(0, 0.001), window coefficients are θ2w=θ2w1=1Ww+1\theta_{2w} = \theta_{2w - 1} = \frac{1}{\sqrt{W - w + 1}}, and the number of windows is 5.
  10. TRPO and TNPG: Settings were kept identical.
  11. LSTM: Configured with 20 units, a dense unit with sigmoid activation, the Adam optimizer, and 2 hidden states.
Image 0

0

1

Updated 2026-05-08

Tags

Data Science

Related