Learn Before
Mitigating Test Set Leakage
To preserve the statistical validity of a test dataset and prevent adaptive overfitting, practitioners should consult test datasets as infrequently as possible. When reporting confidence intervals for model performance, it is necessary to account for the compounding errors associated with multiple hypothesis testing. Furthermore, when conducting a series of benchmark challenges or engaging in prolonged model development, a recommended practice is to maintain several distinct test datasets; after each round of evaluation, the previously used test dataset should be demoted to a validation dataset to ensure that final evaluations remain unbiased and strictly isolated from the modeler's design process.
0
1
Tags
D2L
Dive into Deep Learning @ D2L