Learn Before
Choosing Dev and Test Sets to Reflect Future Data
Dev and test sets should be chosen to reflect data one expects to receive in the future and wants to perform well on. The test set should not simply be 30% of the available data when future data is expected to differ from the training data. One should not assume the training distribution is the same as the test distribution; examples should reflect what one ultimately wants to perform well on rather than only the data available for training.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Purpose of Dev and Test Sets
Choosing Dev and Test Sets to Reflect Future Data
Choosing Dev and Test Sets from the Same Distribution When Possible
Changing Dev/Test Sets or Evaluation Metrics During a Project Is Common
Quickly Establishing Dev/Test Sets and Metric for New Applications
Sizing the Dev Set to Detect Meaningful Accuracy Changes
What is the primary purpose of a development (dev) set in a machine learning project?
The development set is sometimes called the 'hold-out cross validation set'.
The development set is sometimes also called the _____ cross validation set.
Match each development set role to its description in Machine Learning Yearning.
Order the steps for using a dev set to choose between two model configurations.
Which of the following is NOT a stated use of the development set in Machine Learning Yearning?
The dev set and the training set refer to the same data partition in Machine Learning Yearning.
The dev set is used to tune parameters, _____ features, and make other decisions about the learning algorithm.
Match each dev set use from Machine Learning Yearning to the activity that exemplifies it.
Arrange the stages of a machine learning project that involve the dev set in the correct workflow order.
Explain the role and alternative terminology for the development set.
Determine the correct data partition for tuning a model's parameters and features.
List the three primary uses of the development set.
Learn After
Adding More Training Data Does Not Always Help
Special Challenges from Different Training and Dev/Test Distributions
Risk of Merging Training Data Sources Depends on Algorithm Flexibility
Shared Label Mapping Across Data Sources
Training and Dev/Test Sets from Different Distributions
Inconsistent Auxiliary Data Source
Approximating Future Dev/Test Data Before Launch
Updating Dev/Test Sets with Actual User Data After Launch
Risk of Starting with Website Images When Future-Like Data Is Unavailable
Development Investment for Dev and Test Sets Requires Judgment
According to Machine Learning Yearning, what is the primary criterion for choosing dev and test sets?
True or False: When building a dev/test set, it is safe to assume the training distribution is the same as the test distribution.
Dev and test sets should contain examples that reflect what you ultimately want to perform well on, rather than only the _____ you happen to have for training.
Why is using a simple 30% random split of available data as your test set problematic when future data differs from training data?
According to ML Yearning, it is generally safe to assume your training data distribution is the same as your test data distribution.
Dev and test sets should be chosen to reflect data you expect to get in the _____ and want to do well on.
Match each dev/test set concept from ML Yearning to its correct description.
Order the steps for correctly choosing dev and test sets according to ML Yearning's guidance.
According to ML Yearning, what should the examples in your dev and test sets primarily reflect?
According to ML Yearning, dev and test sets must always come from the same distribution as the training data.
ML Yearning warns that the test set should not simply be _____ of the available data when future data differs from the training set.
Match each data scenario to the correct dev/test set strategy decision according to ML Yearning.
Order the reasoning steps for deciding whether a proposed dev/test set is well-chosen, per ML Yearning.
Why Standard Data Splits Fail With Different Future Distributions
Dev and Test Set Design for Mobile Image Applications
The Core Criterion for Dev and Test Set Selection