1Cademy - Sizing the Dev Set to Detect Meaningful Accuracy Changes

Learn Before

Development Set (Dev Set)

Concept

Sizing the Dev Set to Detect Meaningful Accuracy Changes

A dev set should be large enough to detect differences between the algorithms being tried. For example, a 100-example dev set would not be able to detect a 0.1 percentage-point accuracy difference such as 90.0% versus 90.1%. Dev sets of 1,000 to 10,000 examples are common, and a dev set could be much larger than 10,000 when teams want to detect even smaller improvements.

Updated 2026-07-19

Contributors are:

Who are from:

References

Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)

Learn After

Small Dev Sets Cannot Detect Tiny Accuracy Differences
Common Dev Set Sizes for Detecting 0.1% Improvements
High-Value Applications May Need Larger Dev Sets
Statistical Significance Tests on Dev Set Changes
Which dev set best supports detecting a 0.1 percentage-point accuracy improvement?
A dev set should always be made as large as possible, even beyond what meaningful-change detection requires.
A dev set of _____ examples gives a good chance of detecting a 0.1% improvement.
Match each accuracy-detection goal with its dev set implication.
Order the reasoning process for sizing a dev set around a meaningful accuracy change.
Explain how the size of a meaningful accuracy improvement should influence dev set size.
Decide whether a mature recommendation system needs a larger dev set.
Why is a 100-example dev set inadequate for comparing 90.0% and 90.1% accuracy?
When is a dev set much larger than 10,000 examples most justified?
A team satisfied with detecting meaningful changes need not enlarge its dev set far beyond that requirement.

Learn Before

Related

Learn After