Learn Before
Diagnose the evaluation issue when a team corrects dog breed labels in a development set but leaves the test set unchanged.
Case context: A team is developing a classifier to identify dog breeds. During error analysis, they find that 5% of the development set images are mislabeled, so they hire experts to correct all labels in the development set. However, to save budget, they decide not to inspect or correct the labels in the test set. They train their classifier, achieving 98% accuracy on the corrected development set, but only 91% on the test set.
Question: Diagnose why there is a discrepancy in performance and decide what action the team should take to ensure their evaluation is valid.
Sample answer: The discrepancy in performance is caused by the dev and test sets no longer being drawn from the same distribution because only the dev set labels were corrected. The team optimized the classifier for the corrected labels, but it was judged on the uncorrected test set labels. To fix this, the team must apply the exact same label-correcting process to the test set labels so that both sets continue to be drawn from the same distribution and use the same evaluation criterion.
Key points:
- Identify that the dev and test sets are drawn from different distributions due to inconsistent label correction.
- Explain that the team optimized for dev set performance only to be judged on a different test set criterion.
- Recommend applying the same label-correcting process to the test set labels.
Rubric: The response must identify that the dev and test sets are no longer drawn from the same distribution because the label correction was only applied to one of them. It must recommend applying the same label-fixing process to the test set labels to align the evaluation criteria.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Why must the same label-fixing process applied to the dev set also be applied to the test set?
Fixing only dev set labels without applying the same process to the test set can cause the two sets to be drawn from different distributions.
Whatever process you apply to fixing dev set labels, you must also apply it to the _____ labels.
Match each label-fixing scenario to its consequence for dev/test set evaluation.
Order the steps for correctly fixing mislabeled examples while keeping dev and test sets from the same distribution.
What is the primary risk when a team optimizes against a dev set whose labels were fixed differently from the test set?
Applying different label-fixing methods to dev and test sets is acceptable as long as both sets achieve high overall label accuracy.
Fixing dev and test set labels together prevents the team from optimizing for dev set performance only to be judged on a _____ criterion.
Match each key concept from the label-fixing principle to its correct definition.
Order the events that lead to misaligned evaluation when only dev set labels are fixed and not test set labels.
Analyze the consequences of using different label-fixing processes for the development and test sets in a machine learning project.
Diagnose the evaluation issue when a team corrects dog breed labels in a development set but leaves the test set unchanged.
State the primary reason for maintaining consistency in the label-fixing process across both dev and test sets.