Learn Before
Need for Real Datasets of a Big Size
An important problem that is often overlooked in the “cause-effect pair community” concerns benchmarking. The main real data benchmark used by authors to compare the different methods by the community is the Tübingen Dataset, composed of only 100 hundred pairs which are often very similar. It is indeed very difficult to collect cause-effect pairs with enough data points from the real world with an authenticated known ground truth. But one must to keep in mind that it is very easy for most of the methods to tune their hyper parameters (even unintentionally) in order to obtain the best results.
This overfitting problem is often compounded by the fact that this dataset is, most of the time, not separated into train/validation/test sets. To overcome this problem a Cause-effect Pair Challenge has been proposed by Guyon with real and artificial data generated with various mechanisms.
0
1
Tags
Data Science
Related
Relax the Causal Sufficiency Assumption
Need for Real Datasets of a Big Size
Biased Assessment Due to Artifacts in Data
Extension of the Generative Approach for Categorical Variables
Extension of the Pairwise Setting for Complete Graph Inference
Computational Complexity Limitations
Relax Restrictive Assumptions on Causal Mechanisms