Evaluating a Dev Set Size for Detecting 0.1% Classifier Improvements
Case context: A team is building a speech recognition system and expects that their upcoming model updates will yield small but meaningful accuracy improvements of approximately 0.1%. They currently have a dev set containing 1,000 examples, which is within the common size range.
Question: Based on Andrew Ng's guidelines, evaluate whether their current dev set size of 1,000 examples is sufficient for detecting a 0.1% improvement, and recommend a specific dev set size that would give them a good chance of detecting this change.
Sample answer: A dev set of 1,000 examples is insufficient to reliably detect a 0.1% improvement, as a 0.1% change represents only 1 example out of 1,000, which is indistinguishable from random noise. To have a good chance of detecting an improvement of 0.1%, the team should expand their dev set to 10,000 examples.
Key points:
- A dev set size of 1,000 examples is within the common range but is too small to detect a 0.1% improvement.
- A 0.1% change on a dev set of 1,000 examples corresponds to only 1 example.
- A dev set size of 10,000 examples is required to have a good chance of detecting a 0.1% improvement.
Rubric: The student must identify that 1,000 examples is too small to reliably detect a 0.1% improvement and recommend increasing the size of the dev set to 10,000 examples to successfully detect a 0.1% change.
0
1
Tags
Machine Learning
Deep Learning
Machine Learning Strategy
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Yearning @ DeepLearning.AI
Related
How many dev set examples does Andrew Ng recommend to reliably detect a 0.1% accuracy improvement?
True or False: A dev set of 1,000 examples gives you a good chance of detecting a 0.1% accuracy improvement.
Dev sets with _____ to 10,000 examples are described as common in Machine Learning Yearning.
Which range represents the common size for a dev set according to Andrew Ng's Machine Learning Yearning?
True or False: A dev set of 10,000 examples gives you a good chance of detecting an accuracy improvement of 0.1%.
With _____ dev set examples, you have a good chance of detecting an accuracy improvement of 0.1%.
Match each dev set concept to its correct description.
Order the steps for reasoning about whether your dev set is large enough to detect a 0.1% improvement.
What accuracy improvement can you expect to reliably detect with a 10,000-example dev set, per Machine Learning Yearning?
True or False: A dev set of 1,000 examples is too small to fall within the commonly recommended dev set size range.
Dev sets with sizes from _____ to 10,000 examples are considered common in practice.
Match each dev set scenario to its correct outcome or classification.
Order the reasoning steps behind Andrew Ng's recommendation of 10,000 examples for detecting 0.1% improvements.
Explain the relationship between dev set size and the ability to detect minor accuracy improvements, citing common size ranges.
Evaluating a Dev Set Size for Detecting 0.1% Classifier Improvements
Determining the Dev Set Size for 0.1% Performance Changes