Evaluating a team's decision to correct only misclassified dev set labels.
Case context: A development team is working on an image recognition system. To improve their dev set quality, they review only the images that the model misclassified and correct any label errors they find. They claim this makes their dev set evaluation completely unbiased.
Question: Diagnose the flaw in the team's label-correction process. What decision should they make to ensure their dev set evaluation remains unbiased?
Sample answer: The team's process is flawed because it introduces evaluation bias. By only fixing labels on misclassified examples, they ignore incorrectly labeled examples that the model got 'right' by chance. To make the evaluation unbiased, they should decide to also review and correct a subset of the correctly classified dev examples, or avoid fixing labels entirely if they cannot double-check both.
Key points:
- Diagnose that correcting only misclassified labels introduces bias into the evaluation.
- Identify that mislabeled examples that were correctly classified are missed.
- Decide to inspect labels of both misclassified and correctly classified examples to maintain an unbiased dev set.
Rubric: The answer must diagnose the bias introduced by correcting only misclassified labels and recommend either checking both misclassified and correctly classified examples or keeping the dev set as-is to avoid bias.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Practical Convenience Causes Label-Correction Bias in Dev Sets
When Label-Correction Bias Is Acceptable Versus Problematic
What risk arises when you fix label errors only for the dev-set examples your classifier misclassified?
True or False: Correcting mislabeled dev-set examples only where your system was wrong produces an unbiased evaluation.
Fixing labels only on examples your system _____ can introduce bias into dev-set evaluation.
Why does fixing labels only on misclassified dev examples introduce bias into the evaluation?
Fixing labels only on misclassified dev examples can introduce bias into your evaluation.
To avoid label-correction bias, you should review labels of _____ dev examples, not only misclassified ones.
Match each label-correction practice to its effect on dev set evaluation bias.
Order the steps a team should follow when correcting dev set labels to avoid introducing bias.
What is the most likely effect on measured dev set accuracy when labels are corrected only on misclassified examples?
Reviewing only the dev examples your model misclassified is sufficient to ensure an unbiased dev set evaluation.
Label-correction bias arises because mislabeled examples the system classified _____ are never reviewed or fixed.
Match each term related to label-correction bias to its correct definition.
Order the reasoning steps that explain why fixing only misclassified labels introduces bias into dev set evaluation.
Explain how selective label correction on misclassified examples alters estimated dev set performance.
Evaluating a team's decision to correct only misclassified dev set labels.
State the primary risk of fixing only misclassified dev set labels.