1Cademy - Decide whether to clean dev set labels for a highly accurate classifier.

Learn Before

Growing Relative Impact of Mislabeled Dev Examples

Case Study

Decide whether to clean dev set labels for a highly accurate classifier.

Case context: You are building an image classifier. Early on, you noticed some mislabeled images in your dev set but decided not to fix them. Now, your system has achieved a very low overall error rate. However, upon investigating a sample of the remaining errors, you find that a substantial percentage (around 30%) are actually due to mislabeled dev set images rather than algorithm mistakes.

Question: Based on the current state of your classifier, what should your team's next step be regarding the dev set, and how does the current error breakdown justify this decision?

Sample answer: The team should immediately invest time in improving the quality of the labels in the dev set. Because the classifier is highly accurate, the mislabeled examples now make up a large fraction (30%) of the total remaining errors. This adds significant noise to accuracy estimates, preventing the team from accurately distinguishing between small but important performance differences (like a 1.4% versus 2% true error rate).

Key points:

Recommend cleaning the dev set labels.
Identify that mislabeled examples now form a significant relative fraction of total errors.
Note that this noise prevents accurate estimation of model performance.

Rubric: The response should recommend fixing the mislabeled dev set examples and justify this by stating that the high relative percentage of errors caused by bad labels significantly distorts accuracy estimates.

0

1

Updated 2026-06-12

Contributors are:

Who are from:

References

Learn Before

Related