Explain how generalization performance differences between training and dev/test distributions indicate data mismatch.
Question: Based on the provided concepts, explain why an algorithm can generalize well to new data from the training-set distribution but fail on the dev/test set distribution. What does this performance gap indicate about the training data?
Sample answer: This gap indicates a data mismatch problem. The algorithm generalizes well to new data from the training-set distribution because it has learned that distribution. However, it fails to generalize to the dev/test set distribution because the training set data is a poor match for the dev/test set data.
Key points:
- The algorithm generalizes well to new data from the training-set distribution.
- The algorithm does not generalize well to data from the dev/test set distribution.
- This performance gap is called data mismatch.
- The underlying cause is that the training set data is a poor match for the dev/test set data.
Rubric: The answer should explain that the algorithm generalizes well to the training distribution but not the dev/test distribution because of data mismatch, which is caused by the training set data being a poor match for the dev/test set data.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Example Data Mismatch Error Pattern
Finding Training Data That Better Matches Difficult Dev Examples
Addressing Data Mismatch by Comparing Data Properties
What defines data mismatch in the context of training and dev/test sets?
Data mismatch means the algorithm fails to generalize even to new data drawn from the training distribution.
Data mismatch is named because the training set data is a poor _____ for the dev/test set data.
Match each data mismatch term to its correct description.
Order the steps for diagnosing a data mismatch problem from training through evaluation.
What is the root cause of a data mismatch problem between training and dev/test sets?
A speech recognition system that does well on training and training dev sets but poorly on the dev set has a data mismatch problem.
An algorithm with data mismatch generalizes well to the _____ distribution but not to the dev/test distribution.
Match each performance pattern to its diagnostic interpretation regarding data mismatch.
Order the reasoning chain that leads from observation to a data mismatch conclusion.
Explain how generalization performance differences between training and dev/test distributions indicate data mismatch.
Diagnose generalization issues in a speech recognition system.
Explain the naming origin of the data mismatch problem.