Learn Before
Definition

Mathematical Formulation of Curriculum Learning

Let zz be a random variable representing an example for the learner (possibly an (x,y)(x,y) pair for supervised learning). Let P(z)P(z) be the target training distribution from which the learner should ultimately learn a function of interest. Let 0Wλ(z)10 \leq W_{\lambda}(z) \leq 1 be the weight applied to example zz at step λ\lambda in the curriculum sequence, with 0λ10 \leq \lambda \leq 1, and W1(z)=1W_{1}(z) = 1. The corresponding training distribution at step λ\lambda is Qλ(z)Wλ(z)P(z)zQ_{\lambda}(z) \propto W_{\lambda}(z) P(z) \forall z. Consider a monotonically increasing sequence of λ\lambda values, starting from λ=0\lambda = 0 and ending at λ=1\lambda = 1. The corresponding sequence of distributions QλQ_{\lambda} is a curriculum if the entropy of these distributions increases, i.e., H(Qλ)<H(Qλ+ϵ)ϵ>0H(Q_{\lambda}) < H(Q_{\lambda} + \epsilon) \forall \epsilon > 0, and Wλ(z)W_{\lambda}(z) is monotonically increasing in λ\lambda, i.e., Wλ+ϵ(z)Wλ(z)z,ϵ>0W_{\lambda+\epsilon}(z) \geq W_{\lambda}(z) \forall z, \forall \epsilon > 0. This builds up a sequential training process where weights initially favor simpler examples that can be learned relatively easily. The training undergoes adaptation in weighting to increase the probability of difficult examples entering the training set, as a result of which the entropy increases.

0

1

Updated 2026-05-16

Tags

Data Science