1Cademy - Algorithm (Accelerating Human Learning With Deep Reinforcement Learning)

Learn Before

Spaced Repetition via Model-Free Reinforcement Learning (Accelerating Human Learning With Deep Reinforcement Learning)

Concept

Algorithm (Accelerating Human Learning With Deep Reinforcement Learning)

The authors combine trust region policy optimization (TRPO) with a gated recurrent unit (GRU) to solve the partially-observable Markov decision process (POMDP) formulated for spaced-repetition scheduling. At each step, the policy's input is the identity of the previous item, a binary recall outcome, and the time elapsed since that item's last review; the output is the next item to schedule. To scale the policy to a much larger number of items, the authors apply a random projection trick to reduce the dimensionality of the item-identity input.