Decide whether to clean dev set labels for a highly accurate classifier.
Case context: You are building an image classifier. Early on, you noticed some mislabeled images in your dev set but decided not to fix them. Now, your system has achieved a very low overall error rate. However, upon investigating a sample of the remaining errors, you find that a substantial percentage (around 30%) are actually due to mislabeled dev set images rather than algorithm mistakes.
Question: Based on the current state of your classifier, what should your team's next step be regarding the dev set, and how does the current error breakdown justify this decision?
Sample answer: The team should immediately invest time in improving the quality of the labels in the dev set. Because the classifier is highly accurate, the mislabeled examples now make up a large fraction (30%) of the total remaining errors. This adds significant noise to accuracy estimates, preventing the team from accurately distinguishing between small but important performance differences (like a 1.4% versus 2% true error rate).
Key points:
- Recommend cleaning the dev set labels.
- Identify that mislabeled examples now form a significant relative fraction of total errors.
- Note that this noise prevents accurate estimation of model performance.
Rubric: The response should recommend fixing the mislabeled dev set examples and justify this by stating that the high relative percentage of errors caused by bad labels significantly distorts accuracy estimates.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Why do mislabeled dev set examples become more impactful as a classifier improves?
It is acceptable to initially tolerate mislabeled dev/test examples and reconsider that decision as the system improves.
When mislabeled dev examples account for _____ of all errors, improving dev-set label quality becomes worthwhile.
Match each scenario to its correct implication regarding mislabeled dev set examples.
Order the reasoning steps for deciding whether to invest in fixing mislabeled dev set labels.
A classifier has ~2% dev error; 30% of those errors stem from mislabeled dev images. What should you do?
The difference between a classifier error of 1.4% and 2% is a minor detail with little practical significance.
As a classifier improves, the fraction of errors due to mislabeled dev examples _____ relative to total errors.
Match each concept to its role in the growing relative impact of mislabeled dev examples.
Order the stages of how mislabeled dev examples grow in importance across a classifier's development lifecycle.
Analyze the changing impact of mislabeled dev examples over a model's lifecycle.
Decide whether to clean dev set labels for a highly accurate classifier.
Explain why dev set label quality becomes more important over time.