1Cademy - Implementation Details (Accelerating Human Learning With Deep Reinforcement Learning)

Learn Before

Experiments (Accelerating Human Learning With Deep Reinforcement Learning)

Formula

Implementation Details (Accelerating Human Learning With Deep Reinforcement Learning)

These are the parameters used by the authors for their experiments: 1. $n = 30$ (number of items) 2. $T = 200$ (number of steps) 3. $D = 5$ (delay between steps in seconds) 4. For the EFC (Exponential Forgetting Curve) student model, the sample item difficulty ( $\theta$ ) is from the distribution: $\log\theta \sim \mathcal{N}(\log(0.077), 1)$ 5. For the HLR (Half-Life Regression) student model: $\overrightarrow{\theta} = (1, 1, 0, \theta_3)$ where $\theta_3 \sim \mathcal{N}(0, 1)$ ; and overrightarrow{x_i} = (number of attempts, number correct, number incorrect, one-hot encoding of item $i$ out of $n$ items). 6. For the GPL (Generalized Power-Law) student model: $a = \overrightarrow{\alpha} = 0$ ; d sim mathcal{N}(1, 1); $\log \overrightarrow{d} \sim \mathcal{N}(0, 1)$ ; $\log r \sim \mathcal{N}(0, 0.01)$ ; $W = 5$ ; $\theta_{2w} = \theta_{2w - 1} = \frac{1}{\sqrt{W} - w + 1}$ 7. For the TRPO algorithm, the batch size is 4000, $\gamma = 0.99$ , and the step size is 0.01. 8. For the Recurrent Network Policy, the number of hidden layers is 32.

0

1

Updated 2026-07-09

Contributors are:

NL

Nineli Lashkarashvili

Who are from:

San Diego State University

Learn Before

Related