Case Study

Evaluating a Dev Set Size for Detecting 0.1% Classifier Improvements

Case context: A team is building a speech recognition system and expects that their upcoming model updates will yield small but meaningful accuracy improvements of approximately 0.1%. They currently have a dev set containing 1,000 examples, which is within the common size range.

Question: Based on Andrew Ng's guidelines, evaluate whether their current dev set size of 1,000 examples is sufficient for detecting a 0.1% improvement, and recommend a specific dev set size that would give them a good chance of detecting this change.

Sample answer: A dev set of 1,000 examples is insufficient to reliably detect a 0.1% improvement, as a 0.1% change represents only 1 example out of 1,000, which is indistinguishable from random noise. To have a good chance of detecting an improvement of 0.1%, the team should expand their dev set to 10,000 examples.

Key points:

  • A dev set size of 1,000 examples is within the common range but is too small to detect a 0.1% improvement.
  • A 0.1% change on a dev set of 1,000 examples corresponds to only 1 example.
  • A dev set size of 10,000 examples is required to have a good chance of detecting a 0.1% improvement.

Rubric: The student must identify that 1,000 examples is too small to reliably detect a 0.1% improvement and recommend increasing the size of the dev set to 10,000 examples to successfully detect a 0.1% change.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Machine Learning Strategy

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Yearning @ DeepLearning.AI

Related