Essay

Analyze the changing impact of mislabeled dev examples over a model's lifecycle.

Question: Discuss why it is common to initially tolerate mislabeled examples in a dev/test set but later choose to fix them. What drives this shift in strategy as the machine learning system improves?

Sample answer: Initially, when a system's overall error rate is high, mislabeled dev examples make up a small fraction of total errors and do not significantly distort accuracy estimates. As the system improves and the overall error drops, the absolute number of errors decreases. Consequently, the remaining mislabeled dev examples constitute a larger fraction of the total errors. At this stage, they introduce significant noise, making it difficult to distinguish between true improvements (e.g., 1.4% vs 2% error). Thus, it becomes worthwhile to invest in cleaning the dev set labels.

Key points:

  • Initial high error dilutes the impact of mislabeled examples.
  • As the system improves, the relative fraction of mislabeled errors grows.
  • Mislabeled examples eventually add significant noise to accuracy estimates.
  • Cleaning labels becomes necessary to reliably measure further improvements.

Rubric: The answer must explain the shift from tolerating to fixing labels based on the relative fraction of total errors. It should mention that as overall error decreases, mislabeled examples distort accuracy estimates, making further evaluation difficult.

0

1

Updated 2026-06-12

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI