Different Dev/Test Distributions Make Failure Diagnosis Ambiguous
If dev and test sets come from different distributions, a system that works well on the dev set but poorly on the test set has several possible explanations: dev-set overfitting, a harder test set, or a test set that is different rather than harder.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Machine Learning Strategy
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Related
Dev Set Should Reflect the Task to Improve Most
Same-Distribution Dev/Test Failure Indicates Dev Set Overfitting
Different Dev/Test Distributions Make Failure Diagnosis Ambiguous
Third-Party Benchmark Distribution Mismatch Increases Luck
When dev and test sets share the same distribution and test performance is worse than dev performance, what does this clearly indicate?
True or False: When dev and test sets come from different distributions, a performance gap between them has a single, unambiguous diagnosis.
When a system has overfit the dev set and both sets share the same distribution, the obvious cure is to get more _____ data.
Why should the dev set reflect the task a team wants to improve on the most?
If both sets share the same distribution and a model performs well on dev but poorly on test, the clear diagnosis is dev set overfitting.
When a model overfits the dev set and both sets share the same distribution, the obvious cure is to get more _____ data.
Match each dev/test set scenario to its consequence for model diagnosis.
Order the diagnostic steps when a model works well on the dev set but fails on the test set.
Which is a possible explanation for poor test performance when dev and test sets come from different distributions?
When dev and test sets come from different distributions, a system's failure on the test set provides an unambiguous diagnosis.
Once the dev and test sets are defined, a team will be focused on improving _____ set performance.
Match each concept related to dev/test distribution to its correct description.
Order the steps for selecting dev and test sets that support clear model evaluation.
Compare the diagnostics of poor test performance under same vs. different dev/test distributions.
Diagnosing a drop in test set performance with mismatched distributions.
Identify the diagnosis and cure for poor test performance when distributions match.
Learn After
Mismatched Dev/Test Sets Can Waste Dev-Set Optimization Effort
Which is NOT listed in Machine Learning Yearning as a possible cause when a model does well on dev but poorly on the test set (different distributions)?
True or False: When dev and test sets come from different distributions, diagnosing why a model underperforms on the test set is straightforward.
Machine Learning Yearning warns that if dev and test sets come from different _____, a gap in performance leaves the cause of failure unclear.
Match each possible failure cause (when dev and test distributions differ) to its correct description from Machine Learning Yearning.
Order the three possible failure causes as they appear in Machine Learning Yearning when a model succeeds on dev but fails on test with mismatched distributions.
According to Machine Learning Yearning, what is the key implication if the test set is harder than the dev set when the two sets have different distributions?
True or False: According to Machine Learning Yearning, a lower test-set score compared to dev always means the test set is objectively harder.
Machine Learning Yearning states: 'So what works well on the _____ set just does not work well on the test set.'
Match each failure diagnosis (mismatched dev/test distributions) to the corrective implication it would suggest for a practitioner.
Order the reasoning steps that lead a practitioner to recognize diagnostic ambiguity when dev and test sets come from different distributions.
Diagnostic Ambiguity with Mismatched Dev/Test Distributions
Troubleshooting a Performance Drop on the Test Set
Three Causes of Poor Test Performance