Concept

Implementation Details (Accelerating Human Learning With Deep Reinforcement Learning)

These are the parameters used by the authors for their experiments:

  1. n=30n = 30 (number of items)
  2. T=200T = 200 (number of steps)
  3. D=5D = 5 (delay between steps in seconds)
  4. For the EFC (Exponential Forgetting Curve) student model, the sample item difficulty (θ\theta) is from the distribution: logθN(log(0.077),1)\log\theta \sim \mathcal{N}(\log(0.077), 1)
  5. For the HLR (Half-Life Regression) student model: θ=(1,1,0,θ3)\overrightarrow{\theta} = (1, 1, 0, \theta_3) where θ3N(0,1)\theta_3 \sim \mathcal{N}(0, 1); and xi=\overrightarrow{x_i} = (number of attempts, number correct, number incorrect, one-hot encoding of item ii out of nn items).
  6. For the GPL (Generalized Power-Law) student model: a=α=0a = \overrightarrow{\alpha} = 0; dN(1,1)d \sim \mathcal{N}(1, 1); logdN(0,1)\log \overrightarrow{d} \sim \mathcal{N}(0, 1); logrN(0,0.01)\log r \sim \mathcal{N}(0, 0.01); W=5W = 5; θ2w=θ2w1=1Ww+1\theta_{2w} = \theta_{2w - 1} = \frac{1}{\sqrt{W} - w + 1}
  7. For the TRPO algorithm, the batch size is 4000, γ=0.99\gamma = 0.99, and the step size is 0.01.
  8. For the Recurrent Network Policy, the number of hidden layers is 32.

0

1

Updated 2026-05-08

Tags

Data Science