Learn Before
Limitation of Training Data Quantity
A second limitation related with the data is their insufficient amount. As far as neural nets and deep learning are involved in the learning process, the quantity of examples also becomes essential.
Given the comparatively few variable pairs for which the causality label is known from prior knowledge, many authors thus rely on data augmentation, generating new artificial examples from scratch or by perturbing the available examples. However, theoretical results require that causal classifiers be trained and evaluated on examples following the same Mother Distribution.
As in all machine learning problem, the simplest setting is the i.i.d. setting in which training and test data are drawn from the same distribution. The same applies to the cause-effect pair problem: higher performance is attained when the pairs are drawn from the same mother distribution.
Unfortunately, in many real world applications, one does not know from which “mother distribution” a new incoming pair to be classified is drawn and one does not have labeled examples of cause-effect pairs from the “mother distribution” of interest.
0
1
Tags
Data Science