1Cademy - Training and Dev/Test Sets from Different Distributions

Learn Before

Choosing Dev and Test Sets to Reflect Future Data

Concept

Training and Dev/Test Sets from Different Distributions

Training data can differ from the data distribution one ultimately cares about. In the cat-app example, website images made up the training/test sets, while mobile-phone pictures were the actual distribution of interest; the algorithm did not generalize well to that distribution. Therefore, dev and test sets should be chosen to reflect the data one expects to get in the future and wants to do well on.

Updated 2026-07-19

Contributors are:

Who are from:

References

Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)

Learn After

Avoid Randomly Shuffling Mixed-Source Data into Dev/Test Sets
Include Some Target-Distribution Examples in Training Alongside Auxiliary Data
Down-Weighting Auxiliary Data from a Different Distribution
Training Dev Set
Error Table Across Two Data Distributions and Three Error Types
Data Mismatch Between Training and Dev Set Distributions
Limited Practical Scope of Domain Adaptation for Different Data Distributions
Domain Adaptation for Different Data Distributions
Website Images and Mobile Phone Pictures as a Distribution Mismatch Example
Random 70/30 Train/Test Split Can Fail Under Distribution Shift
Which data should define the dev and test sets for the cat-picture app?
Dev and test sets should represent the future data distribution of interest.
Complete the principle: Dev and test sets should reflect _____ data.
Match each cat-app data group with its appropriate role or property.
Order the decisions for building datasets when auxiliary and target data differ.
Explain why different training and evaluation distributions can be appropriate.
Diagnose the evaluation-set mistake in a mobile cat-classification app.
Why did success on website images fail to ensure success on mobile uploads?
Which dataset design best uses both target and auxiliary cat images?
Using extra internet images for training requires internet images in dev and test sets.

Learn Before

Related

Learn After