Concept

Modeling Student Learning and Forgetting (DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills)

As it was written in the paper, the are two main approaches to modeling student learning namely: knowledge tracing and factor analysis.

Knowledge tracing as it can clear from the name of this approach, here we trace student's knowledge and we model the development of student knowledge to predict the sequence of answers. Bayesian Knowledge Tracing model is the most famous model of knowledge tracing.

Factor Analysis compared to knowledge tracing doesn't care about the order of observations.

  1. Item Response Theory (IRT) is factor analysis model:

P(Ys,j=1)=σ(αsδj)P(Y_{s, j} = 1) = \sigma(\alpha_s - \delta_j)

Here αs\alpha_s denotes student's ability and δj\delta_j item difficulty. (it is assumed that student ability is unchangeable, it is fixed). At first glance this might seem very simple model but it is able to learn better the Knowledge Tracing architectures like Deep Knowledge Tracing.

  1. Multidimensional Item Response Theory (MIRT) extension of IRT:

P(Ys,j=1)=σ(<αs,δj>+dj)P(Y_{s, j} = 1) = \sigma(<\alpha_s, \delta_j> + d_j)

Here alphasalpha_s and δj\delta_j denote exactly the same with only one difference, here we have multidimensional vectors and djd_j captures the easiness of the item j.

In recent works, student history was also used in factor analysis and the following models where created: Additive Factor Model (AFM) and Performance Factor Analysis (PFA).

The formula for the AFM looks as:

P(Ys.j=1)=σ(kKC(j)βk+γkαs,k)P(Y_{s. j }= 1) = \sigma (\sum_{k \in KC(j)}\beta_k + \gamma_k \alpha_{s,k} )

here betabeta denotes bias for skill, α\alpha is for the number of attempts and the γ\gamma is for the bias

P(Ys,j=1)=σ(kKC(j)βk+γkcs,k+ρkfs,k) P(Y_{s,j} = 1) = \sigma (\sum_{k \in KC(j)}\beta_k + \gamma_k c_{s,k} + \rho_k f_{s,k})

c (f) is for the number of correct (wrong) answers.

Knowledge Tracing Machine (KTM) incorporates IRT, MIRT, AFM and PFA. The probability of correctness is expressed as:

P(Yt=1)=σ(μ+i=1Nwixt,i+1ilNxt,ixt,l<vi,vt>) P(Y_t = 1) = \sigma (\mu + \sum_{i=1}^N w_i x_{t,i} + \sum_{1\le i \le l \le N } x_{t, i} x_{t, l} <v_i, v_t>)

(NOTE this model was proposed in the paper: Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing)

Here μ\mu is a global bias, N - the number of abstract features, ( for instance, item parameters)., xtx_t - a sample gathering all features collected at time t: which student answers which item, and information regarding prior attempts (x is a sparse vector), wiw_iis the bias of feature i (w is the vector of biases) and viRdv_i \in \R^d its embedding of dimension d. In the prediction will contribute only those xtx_t that are greater than 0. The features involved in a sample xix_i are typically in sparse number, so this probability can be computed efficiently.

DASH(Difficulty Ability and Student History) bridges the gap between factor analysis and memory models. Here both learning and forgetting processes are taken into account.

The authors plan to extend DASH so that it can account for both multiple skills tagging and memory decay.

This is the example of activation of a knowledge tracing machine:

Image 0

0

1

Updated 2020-11-26

Tags

Data Science

Related