1Cademy - Implementation Details (Accelerating Human Learning With Deep Reinforcement Learning)

How it works Courses Research Communities Benefits About Us

Learn Before

Experiments (Accelerating Human Learning With Deep Reinforcement Learning)

Concept

Implementation Details (Accelerating Human Learning With Deep Reinforcement Learning)

These are the parameters used by the authors for their experiments:

$n = 30$ (number of items)
$T = 200$ (number of steps)
$D = 5$ (delay between steps in seconds)
For the EFC (Exponential Forgetting Curve) student model, the sample item difficulty ( $\theta$ ) is from the distribution: $\log\theta \sim \mathcal{N}(\log(0.077), 1)$
For the HLR (Half-Life Regression) student model: $\overrightarrow{\theta} = (1, 1, 0, \theta_3)$ where $\theta_3 \sim \mathcal{N}(0, 1)$ ; and $\overrightarrow{x_i} =$ (number of attempts, number correct, number incorrect, one-hot encoding of item $i$ out of $n$ items).
For the GPL (Generalized Power-Law) student model: $a = \overrightarrow{\alpha} = 0$ ; $d \sim \mathcal{N}(1, 1)$ ; $\log \overrightarrow{d} \sim \mathcal{N}(0, 1)$ ; $\log r \sim \mathcal{N}(0, 0.01)$ ; $W = 5$ ; $\theta_{2w} = \theta_{2w - 1} = \frac{1}{\sqrt{W} - w + 1}$
For the TRPO algorithm, the batch size is 4000, $\gamma = 0.99$ , and the step size is 0.01.
For the Recurrent Network Policy, the number of hidden layers is 32.

0

1

Updated 2026-05-08

Contributors are:

Nineli Lashkarashvili

Nineli Lashkarashvili

Gemini AI

Who are from:

San Diego State University

San Diego State University

Google

Tags

Data Science

Related