1Cademy - Formulation (Accelerating Human Learning with Deep Reinforcement Learning)

Learn Before

Spaced Repetition via Model-Free Reinforcement Learning (Accelerating Human Learning With Deep Reinforcement Learning)

Concept

Formulation (Accelerating Human Learning with Deep Reinforcement Learning)

In the context of optimizing spaced repetition via model-free reinforcement learning, the environment is formulated as a partially-observable Markov decision process.

State Space ( $S$ ): Depends on the student model.
- For the EFC (exponential forgetting curve) model: $S = \mathbb{R}_{+}^{3n}$ , encoding item difficulty, delay, and memory strength.
- For the HLR (half-life regression) model: $S = \theta \times (\mathbb{R}_{+} \times X)^n$ , encoding model parameters, delay, and memory strength.
- For the GPL (generalized-power-law) model: $S = \mathbb{R} \times (\mathbb{R} \times \mathbb{N}^{2W})^n$ , encoding student ability, item difficulty, number of attempts, and number of correct answers over $W$ windows for $n$ items.
Observation Space: The agent can only access observations, not the full state. At every step, the observation set stores whether the student remembered the shown item or not: $O(z | s, \alpha) = P [Z_{\alpha} = z | s]$
Agent/Action Space: Consists of $n$ items that can be shown to the student.
Reward Function ( $\mathcal{R}$ ): Depending on the goal, there are two distinct functions:
- Maximizing expected items recalled: mathcal{R}(s, bullet) = sum_{i=1}^n P [Z_i = 1 | s]
- Maximizing the likelihood of recalling all items: $\mathcal{R}(s, \bullet) = \sum_{i=1}^n \log P [Z_i = 1 | s]$

Note: The discount factor ( $\gamma$ ) influences agent actions. A smaller $\gamma$ encourages intensive studying, while a larger $\gamma$ focuses on long-lasting learning.

0

1

Updated 2026-06-07

Contributors are:

NL

Nineli Lashkarashvili

Who are from:

San Diego State University

Learn Before

Related