Essay

Explain how generalization performance differences between training and dev/test distributions indicate data mismatch.

Question: Based on the provided concepts, explain why an algorithm can generalize well to new data from the training-set distribution but fail on the dev/test set distribution. What does this performance gap indicate about the training data?

Sample answer: This gap indicates a data mismatch problem. The algorithm generalizes well to new data from the training-set distribution because it has learned that distribution. However, it fails to generalize to the dev/test set distribution because the training set data is a poor match for the dev/test set data.

Key points:

  • The algorithm generalizes well to new data from the training-set distribution.
  • The algorithm does not generalize well to data from the dev/test set distribution.
  • This performance gap is called data mismatch.
  • The underlying cause is that the training set data is a poor match for the dev/test set data.

Rubric: The answer should explain that the algorithm generalizes well to the training distribution but not the dev/test distribution because of data mismatch, which is caused by the training set data being a poor match for the dev/test set data.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI