Compare the diagnostics of poor test performance under same vs. different dev/test distributions.
Question: Suppose a model performs well on the dev set but poorly on the test set. Analyze how the diagnosis of this issue and the options for fixing it differ when the dev and test sets come from the same distribution versus when they come from different distributions.
Sample answer: If the dev and test sets come from the same distribution, the diagnosis is clear: the system has overfit the dev set, and the obvious cure is to obtain more dev set data. If the dev and test sets come from different distributions, the diagnosis is ambiguous and options are less clear. The failure could be because the system overfit the dev set, the test set is harder than the dev set, or the algorithm is doing as well as could be expected.
Key points:
- Under the same distribution, poor test set performance clearly indicates the model has overfit the dev set.
- The obvious cure for dev set overfitting when distributions match is to collect more dev set data.
- If distributions differ, diagnosing poor test performance becomes ambiguous and options are less clear.
- Potential reasons for failure under different distributions include dev set overfitting, a harder test set, or the algorithm doing as well as expected.
Rubric: To receive full credit, the answer must identify that: 1. Under the same distribution, the diagnosis is overfitting the dev set and the cure is getting more dev set data. 2. Under different distributions, the diagnosis is ambiguous. 3. The potential causes under different distributions include: overfitting the dev set, the test set being harder, or the algorithm doing as well as could be expected.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Dev Set Should Reflect the Task to Improve Most
Same-Distribution Dev/Test Failure Indicates Dev Set Overfitting
Different Dev/Test Distributions Make Failure Diagnosis Ambiguous
Third-Party Benchmark Distribution Mismatch Increases Luck
When dev and test sets share the same distribution and test performance is worse than dev performance, what does this clearly indicate?
True or False: When dev and test sets come from different distributions, a performance gap between them has a single, unambiguous diagnosis.
When a system has overfit the dev set and both sets share the same distribution, the obvious cure is to get more _____ data.
Why should the dev set reflect the task a team wants to improve on the most?
If both sets share the same distribution and a model performs well on dev but poorly on test, the clear diagnosis is dev set overfitting.
When a model overfits the dev set and both sets share the same distribution, the obvious cure is to get more _____ data.
Match each dev/test set scenario to its consequence for model diagnosis.
Order the diagnostic steps when a model works well on the dev set but fails on the test set.
Which is a possible explanation for poor test performance when dev and test sets come from different distributions?
When dev and test sets come from different distributions, a system's failure on the test set provides an unambiguous diagnosis.
Once the dev and test sets are defined, a team will be focused on improving _____ set performance.
Match each concept related to dev/test distribution to its correct description.
Order the steps for selecting dev and test sets that support clear model evaluation.
Compare the diagnostics of poor test performance under same vs. different dev/test distributions.
Diagnosing a drop in test set performance with mismatched distributions.
Identify the diagnosis and cure for poor test performance when distributions match.