Formulation (Accelerating Human Learning With Deep Reinforcement Learning)
S is a state space which depends on the student model.
For EFC (exponential forgetting curve) state space where the item difficulty, delay and memory strength are encoded.
For HLR(half-life regression) state space where model parameters, delay and memory strength are encoded.
For GPL (generalized-power-law) state space the student ability, item difficulty, number of attempts over W windows, and number of correct answers over W windows for n items are encoded.
It has to be denoted that agent can access observations and not the state.
For the agent we have agent space consisting of n items that can be shown to students. Transitions functions for the three models aren't directly provided. Depending on the goal we have two distinctive rewarding functions. For maximizing expected number of items recalled: For maximizing the likelihood of recalling all items:
Depending on the discount factor we will have different agent actions. For smaller one agent tries to help student to study intensively and for larger one it focuses on long-lasting learning. At every step, observation set stores information about whether student remembered show item or not. For observation distribution we have following formula:
0
1
Tags
Data Science