Analyze the changing impact of mislabeled dev examples over a model's lifecycle.
Question: Discuss why it is common to initially tolerate mislabeled examples in a dev/test set but later choose to fix them. What drives this shift in strategy as the machine learning system improves?
Sample answer: Initially, when a system's overall error rate is high, mislabeled dev examples make up a small fraction of total errors and do not significantly distort accuracy estimates. As the system improves and the overall error drops, the absolute number of errors decreases. Consequently, the remaining mislabeled dev examples constitute a larger fraction of the total errors. At this stage, they introduce significant noise, making it difficult to distinguish between true improvements (e.g., 1.4% vs 2% error). Thus, it becomes worthwhile to invest in cleaning the dev set labels.
Key points:
- Initial high error dilutes the impact of mislabeled examples.
- As the system improves, the relative fraction of mislabeled errors grows.
- Mislabeled examples eventually add significant noise to accuracy estimates.
- Cleaning labels becomes necessary to reliably measure further improvements.
Rubric: The answer must explain the shift from tolerating to fixing labels based on the relative fraction of total errors. It should mention that as overall error decreases, mislabeled examples distort accuracy estimates, making further evaluation difficult.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Why do mislabeled dev set examples become more impactful as a classifier improves?
It is acceptable to initially tolerate mislabeled dev/test examples and reconsider that decision as the system improves.
When mislabeled dev examples account for _____ of all errors, improving dev-set label quality becomes worthwhile.
Match each scenario to its correct implication regarding mislabeled dev set examples.
Order the reasoning steps for deciding whether to invest in fixing mislabeled dev set labels.
A classifier has ~2% dev error; 30% of those errors stem from mislabeled dev images. What should you do?
The difference between a classifier error of 1.4% and 2% is a minor detail with little practical significance.
As a classifier improves, the fraction of errors due to mislabeled dev examples _____ relative to total errors.
Match each concept to its role in the growing relative impact of mislabeled dev examples.
Order the stages of how mislabeled dev examples grow in importance across a classifier's development lifecycle.
Analyze the changing impact of mislabeled dev examples over a model's lifecycle.
Decide whether to clean dev set labels for a highly accurate classifier.
Explain why dev set label quality becomes more important over time.