Finding Training Data That Better Matches Difficult Dev Examples
When there is a data mismatch problem, one recommended option is to find more training data that better matches the dev-set examples the algorithm has trouble with. In the speech-recognition example, when the dev set mostly contains in-car audio while training examples were mostly recorded against a quiet background, one might try to acquire more training data made of audio clips taken in a car.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Example Data Mismatch Error Pattern
Finding Training Data That Better Matches Difficult Dev Examples
Addressing Data Mismatch by Comparing Data Properties
What defines data mismatch in the context of training and dev/test sets?
Data mismatch means the algorithm fails to generalize even to new data drawn from the training distribution.
Data mismatch is named because the training set data is a poor _____ for the dev/test set data.
Match each data mismatch term to its correct description.
Order the steps for diagnosing a data mismatch problem from training through evaluation.
What is the root cause of a data mismatch problem between training and dev/test sets?
A speech recognition system that does well on training and training dev sets but poorly on the dev set has a data mismatch problem.
An algorithm with data mismatch generalizes well to the _____ distribution but not to the dev/test distribution.
Match each performance pattern to its diagnostic interpretation regarding data mismatch.
Order the reasoning chain that leads from observation to a data mismatch conclusion.
Explain how generalization performance differences between training and dev/test distributions indicate data mismatch.
Diagnose generalization issues in a speech recognition system.
Explain the naming origin of the data mismatch problem.
Learn After
No Guaranteed Path for Addressing Data Mismatch
Artificial Data Synthesis for Dev-Set Matching
Representative Synthesized Training Examples
When a model performs well on training data but poorly on the dev set due to distribution differences, what is one recommended strategy?
A data mismatch problem exists when a model performs well on its training set but poorly on a dev set drawn from a different distribution.
When most dev set audio clips were recorded in a _____, one solution is to acquire more training data from that same setting.
Match each concept related to data mismatch to its correct description from ML Yearning.
Order the steps for diagnosing and addressing a data mismatch problem in a speech recognition system.
In ML Yearning's speech recognition example, what is the primary cause of the model's poor performance on the dev set?
According to ML Yearning, acquiring training data that matches the dev set distribution is guaranteed to resolve a data mismatch problem.
To address a data mismatch problem, one recommended option is to find more training data that better _____ the dev-set examples the algorithm has trouble with.
Match each component of the speech recognition scenario to its role in ML Yearning's data mismatch framework.
Order the reasoning steps a practitioner should follow when deciding to seek targeted training data to address a data mismatch.