Concept

Implementation Details (Accelerating Human Learning With Deep Reinforcement Learning)

These are the parameters that were used by the authors:

  1. n = 30 (number of items)

  2. T = 200 (number of steps)

  3. D = 5 (delay between steps (seconds))

  4. For EFL (exponential forgetting curve) sample item difficulty (θ\theta) is from the distribution: logθNlog(0.077,1)log\theta \sim N log(0.077, 1)

  5. For HLR (Half-Life Regression) : θ=(1,1,0,θ3N(0,1))\overrightarrow {\rm \theta} = (1, 1, 0, \theta_3 \sim N (0, 1)) xi=\overrightarrow {\rm x_i} = (num attempts, num correct, num incorrect, one-hot encoding of item i out of n items).

  6. For GPL (Generalized Power-Law) a=α=0;dN(1,1);a = \overrightarrow{\rm \alpha} = 0; d \sim N(1, 1); logdN(0,1),logrN(0,0.01);W=5; \log \overrightarrow {\rm d} \sim N(0,1 ), \log r \sim N (0, 0.01); W = 5; θ2w=θ2w1=1Ww+1 \theta_{2w} = \theta_{2w - 1} = \frac {1}{\sqrt W - w + 1}

  7. For TRPO batch size is 4000, γ=0.99\gamma = 0.99; step_size = 0.01

  8. For Recurrent Network Policy number of hidden layers is 32

0

1

Updated 2020-10-17

Tags

Data Science