Learn Before
Concept

Formalising Curriculum Learning

Let z be a random variable representing an example for the learner (possibly an (x,y) pair for supervised learning). Let P(z) be the target training distribution from which the learner should ultimately learn a function of interest. Let $0 \leq W_{\lambda}(z) \leq 1betheweightappliedtoexamplezatstepbe the weight applied to example z at step\lambda in the curriculum sequence, with $0 \leq \lambda \leq 1, and W1(z)=1W_{1}(z) = 1. The corresponding training distribution at step λ\lambda is Qλ(z)Q_{\lambda}(z) \propto Wλ(z)P(z)zW_{\lambda}(z) P(z) \forall z

Consider a monotonically increasing sequence of λ\lambda values, starting from λ=0\lambda = 0 and ending at λ=1\lambda = 1. The corresponding sequence of distributions QλQ_{\lambda} a curriculum if the entropy of these distributions increases H(Qλ)<H(Qλ+ϵ)ϵ>0H(Q_{\lambda}) < H(Q_{\lambda}+\epsilon) \forall \epsilon > 0 and Wλ(z)W_{\lambda}(z) is monotonically increasing in λ\lambda, i.e., Wλ+ϵ(z)Wλ(z)z,ϵ>0W_{\lambda+\epsilon}(z) \geq W_{\lambda}(z) \forall z,\forall \epsilon > 0.

This builds up a sequential training sequence. Weights initially try and favor the simpler examples that can be learned relatively easily. The training undergoes adaptation in weighting to increase the probability of difficult examples entering training as a result of which the entropy increase.

0

1

Updated 2021-06-24

Tags

Data Science