Evaluate a team's struggles with data synthesis for an image classifier.
Case context: Your team has spent three weeks generating synthetic images for a computer vision project. So far, adding these synthetic images to the training set has not improved the model's accuracy on the dev set. Some team members want to abandon data synthesis, arguing it is a waste of time.
Question: Based on Andrew Ng's advice, what should you diagnose as the likely issue, and what decision should you make regarding the synthetic data effort?
Sample answer: The likely issue is that the details of the synthetic images are not yet close enough to the actual distribution. Andrew Ng notes that this process can take weeks before producing a significant effect. Therefore, the team should not immediately abandon the effort, but instead focus on diagnosing what details are missing and refining the synthetic data generation to better match the real distribution, because getting it right will provide a massive increase in training data.
Key points:
- The synthetic details likely do not match the actual distribution yet.
- It is common for this process to take weeks before seeing a significant effect.
- The team should continue refining the data due to the potential payoff of a much larger training set.
Rubric: The response should identify that the synthetic data details likely do not match the real distribution yet, note that spending weeks without success is a common challenge, and recommend refining the data rather than abandoning the effort.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
What condition must synthetic data satisfy before it can have a significant effect on model training?
True or False: Getting synthetic data details close enough to the actual distribution can take several weeks of work.
Synthesized data must have details close enough to the _____ before it has a significant effect on training.
Match each synthetic data scenario to its expected outcome according to Machine Learning Yearning.
Order the stages of developing effective synthetic data from initial generation to meaningful training impact.
What does Andrew Ng identify as the primary benefit of successfully matching synthetic data details to the real distribution?
True or False: Andrew Ng describes the process of getting synthetic data details right as straightforward and easy to follow.
If you get the details of synthetic data right, you can suddenly access a far _____ training set than before.
Match each key concept from the synthetic data synthesis process with its correct description from Machine Learning Yearning.
Order the reasoning steps a practitioner should follow when deciding whether to invest in synthetic data synthesis.
Analyze the trade-offs of investing time to refine synthetic data details.
Evaluate a team's struggles with data synthesis for an image classifier.
Explain the prerequisite for synthetic data to improve model training.