Random 70/30 Train/Test Split Can Fail Under Distribution Shift
A random 70%/30% split into training and test sets can be a bad idea when the available training distribution differs from the distribution the system ultimately needs to perform well on.
0
1
Tags
Machine Learning
Deep Learning
Machine Learning Strategy
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Related
Avoid Randomly Shuffling Mixed-Source Data into Dev/Test Sets
Include Some Target-Distribution Examples in Training Alongside Auxiliary Data
Down-Weighting Auxiliary Data from a Different Distribution
Training Dev Set
Error Table Across Two Data Distributions and Three Error Types
Data Mismatch Between Training and Dev Set Distributions
Limited Practical Scope of Domain Adaptation for Different Data Distributions
Domain Adaptation for Different Data Distributions
Website Images and Mobile Phone Pictures as a Distribution Mismatch Example
Random 70/30 Train/Test Split Can Fail Under Distribution Shift