1Cademy - Explain how selective label correction on misclassified examples alters estimated dev set performance.

Learn Before

Bias from Fixing Labels of Only Misclassified Dev Examples

Essay

Explain how selective label correction on misclassified examples alters estimated dev set performance.

Question: Analyze the mathematical or statistical impact of only fixing labels on dev set examples that your model misclassified. How does this practice skew the measured performance compared to the true performance of the system?

Sample answer: Fixing only the misclassified examples selectively changes errors into correct classifications in the dataset. Because we do not inspect or correct mislabeled examples that the system happened to classify correctly, we only decrease the count of measured errors and never increase it. This artificially inflates the measured accuracy of the model, creating an optimistic bias where the dev set accuracy looks higher than it actually is.

Key points:

Only the labels of misclassified dev examples are inspected and corrected.
Mislabeled dev examples that the model classified as correct remain uninspected and uncorrected.
This selective process artificially inflates the measured dev set performance, introducing evaluation bias.

Rubric: The response must explain: 1. Only error counts are reduced because only misclassified examples are inspected. 2. Correctly classified examples with incorrect labels are left uncorrected. 3. This selective correction results in an artificially high (optimistically biased) measured dev set accuracy.

0

1

Updated 2026-06-18

Contributors are:

Who are from:

References

Machine Learning Yearning (Deeplearning.ai)

Learn Before

Related