Evaluate the dev and test set distribution strategy for a mobile app development team experiencing slow progress.
Case context: A development team is building a commercial mobile app that recognizes hand-written notes. To speed up training, they collected high-resolution scanner images for their dev set, but they are evaluating their final system using low-resolution photos taken by mobile phones for their test set. The team is struggling with slow progress and inconsistent performance metrics.
Question: Based on Andrew Ng's recommendations for making progress on a specific machine learning application, diagnose the issue with the team's current dataset strategy and propose a solution to improve their efficiency.
Sample answer: The team's dev and test sets are drawn from different distributions (scanner images vs. mobile phone photos). For a specific machine learning application, drawing dev and test sets from different distributions reduces team efficiency. To make the team more efficient, they should change their strategy so that both the dev and test sets are drawn from the same distribution (the mobile phone photos). Developing algorithms to generalize across different distributions is an important research problem, but not suitable for team efficiency when developing a specific application.
Key points:
- The team's dev and test sets are currently from different distributions.
- This mismatch decreases the team's efficiency in making application progress.
- The team should choose dev and test sets from the same distribution (mobile phone photos).
Rubric: The answer should identify that the dev and test sets are currently from different distributions, state that this mismatch reduces team efficiency, and propose drawing both dev and test sets from the same distribution (mobile phone photos) to align with application-focused progress.
0
1
Tags
Machine Learning
Deep Learning
Machine Learning Strategy
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Related
When your goal is progress on a specific ML application, how should dev and test sets be chosen?
True or False: Developing algorithms that train on one distribution and generalize to another is described as an important research problem.
For progress on a specific ML application, dev and test sets should be drawn from the _____ distribution to make the team more efficient.
When your goal is application progress, which dev/test set strategy does Andrew Ng recommend?
Choosing dev and test sets from the same distribution makes a team more efficient when building a specific ML application.
For application progress, dev and test sets should be drawn from the _____ distribution.
Match each term to its correct description from Machine Learning Yearning Chapter 5.
Order the reasoning steps for selecting dev/test distributions in an application-focused ML project.
What does Machine Learning Yearning describe as an 'important research problem' regarding data distributions?
Ng's recommendation to use same-distribution dev/test sets applies equally to both application progress and research progress goals.
Choosing dev and test sets from the same distribution will make your _____ more efficient.
Match each recommendation or outcome to the correct project goal from Machine Learning Yearning.
Order the steps for distinguishing application vs. research goals when deciding on dev/test distributions.
Explain how the choice of data distributions for dev and test sets impacts team efficiency in machine learning application development.
Evaluate the dev and test set distribution strategy for a mobile app development team experiencing slow progress.
Contrast the dev/test set distribution strategy for application progress versus research progress.