Concept

Reward functions and performance metrics (Using deep reinforcement learning for personalizing review sessions on e-learning platforms with spaced repetition)

For different purposes, different reward functions were used.

  1. The Goal: Maximize likelihood of expected number of recalled items: R(s,)=i=1nP[Zi=1s]R (s, \cdot) = \sum_{i=1}^{n} P[Z_i=1 | s]
  2. The Goal: Maximize likelihood of recalling all items: R(s,)=i=1nlogP[ZI=1s]R(s, \cdot) = \sum_{i=1}^{n} \log P[Z_I = 1 | s]

In the paper the authors have defined the reward function as the average of the sum of the correct answers at every time step: R(s,)=iZiR(s, \cdot) = \sum_i Z_i ZiPi(s)Z_i \sim P_i(\cdot | s)

The reward function for the LSTM: RRNN=i=0nPRNN(Zijo0:j1)R_{RNN} = \sum_{i=0}^{n}P_{RNN}(Z_{i}^{j} | o_{0:j-1}) Here n denotes number of items, j current interaction step, P_{RNN} the probability that the user will answer correctly item i and ot=(Zij,i)o_t = (Z_{i}^{j}, i)

0

1

Updated 2020-10-29

Tags

Data Science

Related