Why Standard Data Splits Fail With Different Future Distributions
Question: Explain why simply designating 30% of your available training data as a test set is problematic when you expect future data to differ in nature from your training data. Describe how you should choose your dev and test sets under these conditions according to Machine Learning Yearning.
Sample answer: A standard 30% split assumes the training distribution is the same as the test distribution. If future data differs from training data, the resulting test set will only reflect the training distribution rather than what the system expects to receive in the future. According to Machine Learning Yearning, dev and test sets should be chosen to reflect the data one expects to get in the future and wants to perform well on, rather than simply whatever data happens to be available for training.
Key points:
- A 30% random split fails to capture differing future distributions.
- Training and test distributions should not be assumed to be the same.
- Dev and test sets must reflect future data that the system needs to perform well on.
Rubric: The answer should explain that a 30% split fails because it assumes the training and test distributions are identical, and explain that dev/test sets must reflect the future data one expects to get and wants to perform well on.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Adding More Training Data Does Not Always Help
Special Challenges from Different Training and Dev/Test Distributions
Risk of Merging Training Data Sources Depends on Algorithm Flexibility
Shared Label Mapping Across Data Sources
Training and Dev/Test Sets from Different Distributions
Inconsistent Auxiliary Data Source
Approximating Future Dev/Test Data Before Launch
Updating Dev/Test Sets with Actual User Data After Launch
Risk of Starting with Website Images When Future-Like Data Is Unavailable
Development Investment for Dev and Test Sets Requires Judgment
According to Machine Learning Yearning, what is the primary criterion for choosing dev and test sets?
True or False: When building a dev/test set, it is safe to assume the training distribution is the same as the test distribution.
Dev and test sets should contain examples that reflect what you ultimately want to perform well on, rather than only the _____ you happen to have for training.
Why is using a simple 30% random split of available data as your test set problematic when future data differs from training data?
According to ML Yearning, it is generally safe to assume your training data distribution is the same as your test data distribution.
Dev and test sets should be chosen to reflect data you expect to get in the _____ and want to do well on.
Match each dev/test set concept from ML Yearning to its correct description.
Order the steps for correctly choosing dev and test sets according to ML Yearning's guidance.
According to ML Yearning, what should the examples in your dev and test sets primarily reflect?
According to ML Yearning, dev and test sets must always come from the same distribution as the training data.
ML Yearning warns that the test set should not simply be _____ of the available data when future data differs from the training set.
Match each data scenario to the correct dev/test set strategy decision according to ML Yearning.
Order the reasoning steps for deciding whether a proposed dev/test set is well-chosen, per ML Yearning.
Why Standard Data Splits Fail With Different Future Distributions
Dev and Test Set Design for Mobile Image Applications
The Core Criterion for Dev and Test Set Selection