Explaining the Co-existence of Avoidable Bias, Variance, and Data Mismatch
Question: Based on the concept that a machine learning algorithm can suffer from any subset of high avoidable bias, high variance, and data mismatch, explain why these three errors are not mutually exclusive. Specifically, describe how an algorithm could simultaneously exhibit all three issues, and why addressing one does not automatically resolve the others.
Sample answer: Avoidable bias, variance, and data mismatch are distinct sources of error in a machine learning pipeline. Avoidable bias is the gap between human-level performance and training performance. Variance is the gap between training performance and training-dev performance. Data mismatch is the gap between training-dev performance and dev performance. Because these gaps are measured between different stages of the data and model pipeline, they are independent. Consequently, a model can simultaneously suffer from any subset of these issues, such as having a large gap in all three areas. Addressing one issue, such as increasing model size to reduce avoidable bias, does not fix independent problems like a mismatch between training and validation data distributions.
Key points:
- Avoidable bias, variance, and data mismatch are independent sources of error measured by different performance gaps.
- Because they stem from distinct causes, an algorithm can suffer from any subset (any combination) of these three issues.
- Resolving one error source does not automatically fix or improve the other independent error sources.
Rubric: The response must explain that avoidable bias, variance, and data mismatch are independent sources of error measured between different performance gaps. It must state that they can co-exist in any subset/combination and explain that resolving one does not automatically fix the others.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
High Avoidable Bias and Data Mismatch Without High Variance
Which statement best describes how avoidable bias, variance, and data mismatch can affect a single learning algorithm?
True or False: A learning algorithm can exhibit high avoidable bias and data mismatch at the same time without necessarily having high variance.
According to Machine Learning Yearning, it is possible for an algorithm to suffer from any _____ of high avoidable bias, high variance, and data mismatch.
Which statement best describes how high avoidable bias, high variance, and data mismatch can co-exist in a single algorithm?
An algorithm can exhibit high variance and data mismatch simultaneously, without suffering from high avoidable bias.
It is possible for an algorithm to suffer from any _____ of high avoidable bias, high variance, and data mismatch.
Match each of the three error sources to the comparison that most directly reveals it.
Order the diagnostic steps for identifying which subset of the three error sources affects an algorithm.
Training error equals human-level error, training-dev error closely matches training error, but dev error is far higher. Which subset of problems is present?
An algorithm must always exhibit all three problems—high avoidable bias, high variance, and data mismatch—together; they cannot occur in isolation.
When training error ≈ human-level and training-dev ≈ training error, but dev error is much higher, the algorithm suffers from data _____ as its primary problem.
Match each two-problem combination to the diagnostic error-gap pattern it produces.
Order the reasoning steps for planning improvements when an algorithm is diagnosed with all three problems simultaneously.
Explaining the Co-existence of Avoidable Bias, Variance, and Data Mismatch
Diagnosing Co-existing Errors in a Speech Recognition System
Subsets of Error Sources in Machine Learning Algorithms