Concept

Working Mechanism of DKVMN (Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory)

"DKVMN model works as follows: at time t, it first receives a KC qtq_t, then predicts the probability of answering qtq_t correctly, and eventually updates the memory using the question-and-answer interaction (qtq_t, ata_t)." We can think that there are for Q different knowledge components (KCs) we have N latent concepts. These latent concepts are in key memory - MkRN×dkM^k \in \R ^{\N \times d_k}. Here dkd_k denotes the embedding size of key memory slot. Knowledge states are stored in value memory: MvRN×dvM^v \in R^{N \times d_v}. Here dvd_v also denotes the embedding size but in this case of value memory slot. DKVMN has three major steps:

  1. Getting Attention Weight Here firstly, qtq_t is extracted from knowledge components embedding matrix, then it is used as ktk_t to search for the key in key memory matrix and finally we get weighing which measures how much attention should be paid for each value in the value memory matrix: wti=Softmax(Mikkt)w_{ti} = Softmax(M_i^k k_t) and iNwti=1\sum_i ^N w_{ti} = 1, MikM_{ik} is i-th row vector, wtiw_{ti} is i-th element from weight vector.

  2. Making Prediction Here firstly we read the latent knowledge state in the value memory MtvM_t^v in order to create read vector: rt=i=1Nwti(Mtiv)Tr_t = \sum_{i=1}^N w_{ti} (M_{ti}^v)^T After this operation read and KC embedding(ktk_t) vectors are concatenated and are used to generate feature vector, which in turn is used to calculate the probability of the student answering qtq_t knowledge component correctly: ft=tanh(Wf[rt,kt]+bf)f_t = tanh(W_f [r_t, k_t] + b_f) pt=P(αt)=σ(Wpft+bp)p_t = P(\alpha_t) = \sigma (W_p f_t + b_p) Both of this functions are applied element-wise and W and b are weight matrix and bias vector.

  3. Updating Value Memory Here we firstly retrieve (qt,αt)(q_t, \alpha_t) embedding vector from KC-embedding matrix. This embedding vector (vt)(v_t) is a representation of the knowledge growth after working with qtq_t along with correct label αt\alpha_t. In the update operation part of the memory is removed before we would add new information.

et=σ(Wevt+be)e_t = \sigma(W_ev_t + b_e) at=tanh(Wavt+ba)a_t = tanh(W_av_t + b_a) Mt+1,iv=Mtiv(1wtiet)T \sim M_{t+1, i}^v = M_{ti}^v \otimes (1 - w_{ti}e_t)^T Mt+1,iv=Mt+1,iv+wtiatT M_{t+1, i}^v = \sim M_{t+1, i}^v + w_{ti}a_t^T

0

1

Updated 2020-11-17

Tags

Data Science