Evaluating dev and test set sizes for a billion-scale image classification system
Case context: An engineering team is building an image classification system with 1 billion images. Following traditional textbook guidelines, a junior engineer proposes using a 70/30 train/test split, meaning 300 million images will be reserved for the test set.
Question: Diagnose the issue with the junior engineer's proposal based on Andrew Ng's guidelines. What should the team decide regarding the fraction and absolute size of the test set, and what is the guiding principle for determining the size of dev/test sets?
Sample answer: The junior engineer's proposal is incorrect because the 70/30 split heuristic does not apply to big-data problems with a billion examples. Reserving 300 million examples for the test set is excessively large and unnecessary. The team should decide to use a much smaller fraction (far less than 30%) for the dev/test sets. The guiding principle is that dev/test sets only need to be large enough to reliably evaluate algorithm performance, and they should not be excessively large beyond that requirement.
Key points:
- A 70/30 split heuristic is inappropriate and wasteful for a dataset of 1 billion examples.
- The fraction of data allocated to the test set should be shrunk to much less than 30%.
- The dev/test set size should be determined solely by what is required to evaluate algorithm performance.
Rubric: Response should correctly identify that a 70/30 split is inappropriate for a billion-scale dataset. It must state that the test set fraction should shrink and be much less than 30%. It must specify that the test set size is determined by what is needed to evaluate performance rather than a fixed percentage.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
For which dataset size is the 70/30 train/test split heuristic most appropriate?
True or False: As ML datasets grow to billions of examples, the fraction of data allocated to dev/test sets should grow proportionally to maintain the 70/30 split.
For large-scale ML problems, the _____ of data allocated to dev/test sets has been shrinking even as the absolute number of examples grows.
For which example-count range does the 70/30 train/test split heuristic work well, according to Andrew Ng?
As total dataset size grows into the billions, the fraction of data allocated to dev/test sets also increases.
The 70/30 train/test heuristic works well when you have a _____ number of examples, such as 100 to 10,000.
Match each dataset scale or concept to the correct dev/test sizing implication from Machine Learning Yearning.
Order the reasoning steps Ng uses to argue that the 70/30 heuristic does not apply to large datasets.
In big-data ML problems, what happens to the absolute number of dev/test examples as total dataset size grows?
Dev/test sets should be no larger than what is needed to evaluate the performance of your algorithms.
One popular heuristic was to use _____ of your data for your test set, though this does not apply to large datasets.
Match each train/dev/test splitting term to its correct definition from Machine Learning Yearning.
Order the steps a practitioner should follow when deciding test set size as their dataset grows from thousands to billions of examples.
Why does the traditional 70/30 train/test split heuristic fail to scale efficiently for big-data machine learning problems?
Evaluating dev and test set sizes for a billion-scale image classification system
Determining dev/test set size in the era of big-data machine learning