Learn Before
Analyze the consequences of using different label-fixing processes for the development and test sets in a machine learning project.
Question: Suppose a machine learning team decides to manually inspect and correct mislabeled examples in their development (dev) set, but they leave the test set labels untouched. Analyze the consequences of this decision on the model development process, focusing on the distribution of the two datasets and the evaluation criteria used to judge final performance.
Sample answer: If the team only corrects mislabeled examples in the dev set and not the test set, the dev and test sets will no longer be drawn from the same distribution. Consequently, the team will optimize their model based on the corrected dev set labels. However, when the final model is evaluated on the uncorrected test set, it will be judged on a different criterion. This misalignment can lead to a situation where the model performs well during development but fails to meet expectations on the test set.
Key points:
- Fixing only dev set labels leads to the dev and test sets being drawn from different distributions.
- The team will optimize model performance based on the corrected dev set criterion.
- The final model will be evaluated and judged on a different criterion (the uncorrected test set).
- This misalignment can cause optimized dev-set performance to not generalize to the test-set evaluation.
Rubric: The answer must explain that fixing only dev set labels causes the dev and test sets to be drawn from different distributions. It must also explain that this leads the team to optimize for a dev set performance criterion that differs from the test set criterion used for final evaluation.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Why must the same label-fixing process applied to the dev set also be applied to the test set?
Fixing only dev set labels without applying the same process to the test set can cause the two sets to be drawn from different distributions.
Whatever process you apply to fixing dev set labels, you must also apply it to the _____ labels.
Match each label-fixing scenario to its consequence for dev/test set evaluation.
Order the steps for correctly fixing mislabeled examples while keeping dev and test sets from the same distribution.
What is the primary risk when a team optimizes against a dev set whose labels were fixed differently from the test set?
Applying different label-fixing methods to dev and test sets is acceptable as long as both sets achieve high overall label accuracy.
Fixing dev and test set labels together prevents the team from optimizing for dev set performance only to be judged on a _____ criterion.
Match each key concept from the label-fixing principle to its correct definition.
Order the events that lead to misaligned evaluation when only dev set labels are fixed and not test set labels.
Analyze the consequences of using different label-fixing processes for the development and test sets in a machine learning project.
Diagnose the evaluation issue when a team corrects dog breed labels in a development set but leaves the test set unchanged.
State the primary reason for maintaining consistency in the label-fixing process across both dev and test sets.