Diagnosing a cat recognizer with high training and dev error rates.
Case context: You are building a cat recognizer with a target error rate of 5%. Currently, the model achieves a 15% error rate on the training set and a 16% error rate on the dev set. Your colleague proposes collecting 10,000 more training images to resolve this issue.
Question: Based on the error rates and target, diagnose the primary problem, evaluate the proposed solution, and state what the first step should be to improve performance.
Sample answer: The model is suffering from avoidable bias because the training error of 15% is much higher than the target error of 5%. Collecting 10,000 more training images will not help because adding training data resolves variance (since training and dev error are already close at 15% and 16%, variance is low) but has no significant effect on bias. The first step should be to focus on improving the algorithm's performance on the current training set, rather than gathering more data.
Key points:
- Identify avoidable bias as the primary issue due to the 10% gap between training error (15%) and target error (5%).
- Explain that collecting more training data is an ineffective solution because it addresses variance, not bias.
- Recommend focusing first on improving the model's performance on the training set.
Rubric: Diagnose avoidable bias based on the 15% training error vs 5% target error. Evaluate the proposal to collect more data as ineffective because it addresses variance rather than bias. Recommend focusing on improving training-set performance first.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Your cat recognizer has 15% training error and 16% dev error, but your target is 5% error. What should you do first?
True or False: Adding more training data is an effective technique for reducing high training error caused by high avoidable bias.
Adding more training data is a technique that helps with _____ problems, but it usually has no significant effect on bias.
Your training error is 15% and your target is 5%. What should you focus on first?
True or False: Adding more training data is an effective technique for reducing high avoidable bias.
Adding more training data helps with _____ problems but usually has no significant effect on bias.
Match each scenario or technique to its correct description regarding bias and variance.
Order the steps for correctly diagnosing and responding to a model with high training error.
Training error is 15%, dev error is 16%, and target is 5%. What does this pattern primarily indicate?
True or False: You should expect significant dev/test improvement even if training error remains high after adding more data.
When training error is high, first improve performance on the _____ set before expecting dev/test performance to improve.
Match each error pattern to the correct diagnosis and recommended action.
Order the decision-making steps a practitioner should follow when evaluating whether to add more training data.
Analyze why expanding the training dataset fails to resolve high training-set error.
Diagnosing a cat recognizer with high training and dev error rates.
Identify the primary system metric to improve before expecting dev/test set gains.