1Cademy - Need for Real Datasets of a Big Size

Learn Before

Open Problems and Extensions

Concept

Need for Real Datasets of a Big Size

An important problem that is often overlooked in the “cause-effect pair community” concerns benchmarking. The main real data benchmark used by authors to compare the different methods by the community is the Tübingen Dataset, composed of only 100 hundred pairs which are often very similar. It is indeed very difficult to collect cause-effect pairs with enough data points from the real world with an authenticated known ground truth. But one must to keep in mind that it is very easy for most of the methods to tune their hyper parameters (even unintentionally) in order to obtain the best results.

This overfitting problem is often compounded by the fact that this dataset is, most of the time, not separated into train/validation/test sets. To overcome this problem a Cause-effect Pair Challenge has been proposed by Guyon with real and artificial data generated with various mechanisms.

0

1

Updated 2020-07-24

Contributors are:

Who are from:

University of Michigan - Ann Arbor

🏆 1

References

Cause Effect Pairs in Machine Learning

Learn Before

Related