Case Study

Diagnosing a Performance Gap After Development

Case context: Your team has just finished a three-month development cycle for a new image classification algorithm. Throughout the process, the team continuously checked performance against the dev set, tweaking the architecture to maximize accuracy. When development concluded, the algorithm achieved 98% accuracy on the dev set but only 82% on the test set.

Question: Based on the provided scenario, what specific problem should you diagnose regarding your dataset evaluation, and what is the recommended next step to address this issue?

Sample answer: The team should diagnose that the algorithm has gradually overfit to the dev set due to repeated evaluation during the three-month cycle. This is evident from the massive performance gap (98% vs 82%) between the dev and test sets. The recommended next step is to get a fresh dev set to replace the one that the algorithm has overfit to.

Key points:

  • Diagnose dev-set overfitting.
  • Identify that the large gap between dev (98%) and test (82%) performance indicates the overfitting.
  • Recommend getting a fresh dev set.

Rubric: Full credit is given for diagnosing dev-set overfitting due to repeated evaluations and prescribing the acquisition of a new dev set.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Machine Learning Strategy

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Yearning @ DeepLearning.AI