Essay

Methodology for analyzing distribution discrepancies when data mismatch is detected

Question: Suppose your machine learning model performs well on its training set but does poorly on the dev set. Explain the recommended action to begin addressing this data mismatch, specifying what you need to analyze regarding the datasets and their distributions.

Sample answer: When a data mismatch problem is identified because the model performs poorly on the dev set compared to the training set, the recommended action is to try to understand what properties of the data differ between the training and the dev-set distributions.

Key points:

  • Acknowledge the detection of a data mismatch problem when the model does poorly on the dev set.
  • Specify the recommended action: try to understand the properties of the data.
  • Focus the analysis on how properties differ between the training and dev-set distributions.

Rubric: The response must specify that when a data mismatch is detected (due to poor performance on the dev set), the recommended step is to try to understand which properties of the data differ between the training and the dev-set distributions.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related