Learn Before
Case Study

Propose a dev set splitting strategy to balance manual error analysis with objective performance evaluation.

Case context: A machine learning team is working on a system with a 5,000-example dev set. They want to perform manual error analysis to identify major error categories and make progress. However, they also need to ensure that their final dev set performance evaluation remains completely objective and free from overfitting to the specific examples they manually inspect.

Question: Based on this scenario and the principles of dev set splitting, describe the strategy the team should implement. Identify the two subsets they should create, explain their respective purposes, and state the rules regarding manual examination for each.

Sample answer: The team should split their dev set into two subsets: an Eyeball dev set and a Blackbox dev set. The Eyeball dev set (e.g., 10% of the dev set) is designated for manual error analysis, allowing the team to look at misclassified examples. The Blackbox dev set is the remaining portion of the dev set and must not be manually examined, ensuring it remains a hands-off subset for objective evaluation.

Key points:

  • Split the dev set into an Eyeball dev set and a Blackbox dev set.
  • Use the Eyeball dev set for manual error analysis of misclassified examples.
  • Keep the Blackbox dev set hands-off by avoiding any manual examination of its contents.

Rubric: The response must propose splitting the dev set into an Eyeball dev set and a Blackbox dev set, specify that the Eyeball dev set is manually examined for error analysis, and specify that the Blackbox dev set must not be manually examined to maintain objective evaluation.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI

Related