Analyze why expanding the training dataset fails to resolve high training-set error.
Question: According to the course principles, explain why adding more training data is an ineffective technique when a model's training error is significantly higher than the target error, and describe what the developer must focus on instead.
Sample answer: When training error is high, the primary issue is high avoidable bias. Adding more training data is designed to resolve variance problems (where the model fails to generalize to new data) but does not improve the model's capacity to fit the existing training set. Therefore, it has no significant effect on reducing bias. Instead, the developer must first focus on improving the algorithm's performance on the training set before expecting any improvements on the dev or test sets.
Key points:
- Adding more training data helps with variance problems, not bias.
- A high training error relative to target error indicates avoidable bias.
- Adding training data has no significant effect on reducing bias.
- Practitioners must first improve performance on the training set.
Rubric: Response should clearly state that adding more data helps with variance but not bias/training error. It must specify that the model has high avoidable bias, and that the developer must prioritize improving training set performance before dev/test set performance.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Your cat recognizer has 15% training error and 16% dev error, but your target is 5% error. What should you do first?
True or False: Adding more training data is an effective technique for reducing high training error caused by high avoidable bias.
Adding more training data is a technique that helps with _____ problems, but it usually has no significant effect on bias.
Your training error is 15% and your target is 5%. What should you focus on first?
True or False: Adding more training data is an effective technique for reducing high avoidable bias.
Adding more training data helps with _____ problems but usually has no significant effect on bias.
Match each scenario or technique to its correct description regarding bias and variance.
Order the steps for correctly diagnosing and responding to a model with high training error.
Training error is 15%, dev error is 16%, and target is 5%. What does this pattern primarily indicate?
True or False: You should expect significant dev/test improvement even if training error remains high after adding more data.
When training error is high, first improve performance on the _____ set before expecting dev/test performance to improve.
Match each error pattern to the correct diagnosis and recommended action.
Order the decision-making steps a practitioner should follow when evaluating whether to add more training data.
Analyze why expanding the training dataset fails to resolve high training-set error.
Diagnosing a cat recognizer with high training and dev error rates.
Identify the primary system metric to improve before expecting dev/test set gains.