Learn Before
Essay

Analyze the consequences of using different label-fixing processes for the development and test sets in a machine learning project.

Question: Suppose a machine learning team decides to manually inspect and correct mislabeled examples in their development (dev) set, but they leave the test set labels untouched. Analyze the consequences of this decision on the model development process, focusing on the distribution of the two datasets and the evaluation criteria used to judge final performance.

Sample answer: If the team only corrects mislabeled examples in the dev set and not the test set, the dev and test sets will no longer be drawn from the same distribution. Consequently, the team will optimize their model based on the corrected dev set labels. However, when the final model is evaluated on the uncorrected test set, it will be judged on a different criterion. This misalignment can lead to a situation where the model performs well during development but fails to meet expectations on the test set.

Key points:

  • Fixing only dev set labels leads to the dev and test sets being drawn from different distributions.
  • The team will optimize model performance based on the corrected dev set criterion.
  • The final model will be evaluated and judged on a different criterion (the uncorrected test set).
  • This misalignment can cause optimized dev-set performance to not generalize to the test-set evaluation.

Rubric: The answer must explain that fixing only dev set labels causes the dev and test sets to be drawn from different distributions. It must also explain that this leads the team to optimize for a dev set performance criterion that differs from the test set criterion used for final evaluation.

0

1

Updated 2026-06-07

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI