Learn Before
Concept

Need for Real Datasets of a Big Size

An important problem that is often overlooked in the “cause-effect pair community” concerns benchmarking. The main real data benchmark used by authors to compare the different methods by the community is the Tübingen Dataset, composed of only 100 hundred pairs which are often very similar. It is indeed very difficult to collect cause-effect pairs with enough data points from the real world with an authenticated known ground truth. But one must to keep in mind that it is very easy for most of the methods to tune their hyper parameters (even unintentionally) in order to obtain the best results.

This overfitting problem is often compounded by the fact that this dataset is, most of the time, not separated into train/validation/test sets. To overcome this problem a Cause-effect Pair Challenge has been proposed by Guyon with real and artificial data generated with various mechanisms.

0

1

Updated 2020-07-24

Tags

Data Science