Case Study

Evaluating dev and test set sizes for a billion-scale image classification system

Case context: An engineering team is building an image classification system with 1 billion images. Following traditional textbook guidelines, a junior engineer proposes using a 70/30 train/test split, meaning 300 million images will be reserved for the test set.

Question: Diagnose the issue with the junior engineer's proposal based on Andrew Ng's guidelines. What should the team decide regarding the fraction and absolute size of the test set, and what is the guiding principle for determining the size of dev/test sets?

Sample answer: The junior engineer's proposal is incorrect because the 70/30 split heuristic does not apply to big-data problems with a billion examples. Reserving 300 million examples for the test set is excessively large and unnecessary. The team should decide to use a much smaller fraction (far less than 30%) for the dev/test sets. The guiding principle is that dev/test sets only need to be large enough to reliably evaluate algorithm performance, and they should not be excessively large beyond that requirement.

Key points:

  • A 70/30 split heuristic is inappropriate and wasteful for a dataset of 1 billion examples.
  • The fraction of data allocated to the test set should be shrunk to much less than 30%.
  • The dev/test set size should be determined solely by what is required to evaluate algorithm performance.

Rubric: Response should correctly identify that a 70/30 split is inappropriate for a billion-scale dataset. It must state that the test set fraction should shrink and be much less than 30%. It must specify that the test set size is determined by what is needed to evaluate performance rather than a fixed percentage.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related