1Cademy - Evaluating a teams decision to correct only misclassified dev set labels.

Learn Before

Bias from Fixing Labels of Only Misclassified Dev Examples

Case Study

Evaluating a team's decision to correct only misclassified dev set labels.

Case context: A development team is working on an image recognition system. To improve their dev set quality, they review only the images that the model misclassified and correct any label errors they find. They claim this makes their dev set evaluation completely unbiased.

Question: Diagnose the flaw in the team's label-correction process. What decision should they make to ensure their dev set evaluation remains unbiased?

Sample answer: The team's process is flawed because it introduces evaluation bias. By only fixing labels on misclassified examples, they ignore incorrectly labeled examples that the model got 'right' by chance. To make the evaluation unbiased, they should decide to also review and correct a subset of the correctly classified dev examples, or avoid fixing labels entirely if they cannot double-check both.

Key points:

Diagnose that correcting only misclassified labels introduces bias into the evaluation.
Identify that mislabeled examples that were correctly classified are missed.
Decide to inspect labels of both misclassified and correctly classified examples to maintain an unbiased dev set.

Rubric: The answer must diagnose the bias introduced by correcting only misclassified labels and recommend either checking both misclassified and correctly classified examples or keeping the dev set as-is to avoid bias.

0

1

Updated 2026-06-18

Contributors are:

Who are from:

References

Machine Learning Yearning (Deeplearning.ai)

Learn Before

Related