1Cademy - Different Dev/Test Distributions Make Failure Diagnosis Ambiguous

Learn Before

Choosing Dev and Test Sets from the Same Distribution When Possible

Concept

Different Dev/Test Distributions Make Failure Diagnosis Ambiguous

If dev and test sets come from different distributions, a system that works well on the dev set but poorly on the test set has several possible explanations: dev-set overfitting, a harder test set, or a test set that is different rather than harder.

Updated 2026-06-17

Contributors are:

Who are from:

References

Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)

Learn After

Mismatched Dev/Test Sets Can Waste Dev-Set Optimization Effort
Which is NOT listed in Machine Learning Yearning as a possible cause when a model does well on dev but poorly on the test set (different distributions)?
True or False: When dev and test sets come from different distributions, diagnosing why a model underperforms on the test set is straightforward.
Machine Learning Yearning warns that if dev and test sets come from different _____, a gap in performance leaves the cause of failure unclear.
Match each possible failure cause (when dev and test distributions differ) to its correct description from Machine Learning Yearning.
Order the three possible failure causes as they appear in Machine Learning Yearning when a model succeeds on dev but fails on test with mismatched distributions.
According to Machine Learning Yearning, what is the key implication if the test set is harder than the dev set when the two sets have different distributions?
True or False: According to Machine Learning Yearning, a lower test-set score compared to dev always means the test set is objectively harder.
Machine Learning Yearning states: 'So what works well on the _____ set just does not work well on the test set.'
Match each failure diagnosis (mismatched dev/test distributions) to the corrective implication it would suggest for a practitioner.
Order the reasoning steps that lead a practitioner to recognize diagnostic ambiguity when dev and test sets come from different distributions.
Diagnostic Ambiguity with Mismatched Dev/Test Distributions
Troubleshooting a Performance Drop on the Test Set
Three Causes of Poor Test Performance

Learn Before

Related

Learn After