Methodology for analyzing distribution discrepancies when data mismatch is detected
Question: Suppose your machine learning model performs well on its training set but does poorly on the dev set. Explain the recommended action to begin addressing this data mismatch, specifying what you need to analyze regarding the datasets and their distributions.
Sample answer: When a data mismatch problem is identified because the model performs poorly on the dev set compared to the training set, the recommended action is to try to understand what properties of the data differ between the training and the dev-set distributions.
Key points:
- Acknowledge the detection of a data mismatch problem when the model does poorly on the dev set.
- Specify the recommended action: try to understand the properties of the data.
- Focus the analysis on how properties differ between the training and dev-set distributions.
Rubric: The response must specify that when a data mismatch is detected (due to poor performance on the dev set), the recommended step is to try to understand which properties of the data differ between the training and the dev-set distributions.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Error Analysis for Data Mismatch
When a data mismatch problem is identified between training and dev sets, what is the recommended first step?
True or False: When a data mismatch problem is found, Andrew Ng recommends understanding which properties of the data differ between the training and dev set distributions.
When a data mismatch problem is found, a recommended step is to understand what _____ of the data differ between the training and dev set distributions.
According to Ng, what is the recommended first step after identifying a data mismatch problem?
When addressing data mismatch, Ng recommends analyzing which properties of the data differ between the training and dev distributions.
When a data mismatch problem is found, the recommended step is to understand which _____ of the data differ between training and dev-set distributions.
Match each term to its role in Ng's data mismatch framework.
Arrange the steps for diagnosing and beginning to address a data mismatch problem in the correct order.
What is the purpose of comparing data properties between training and dev sets when addressing a mismatch problem?
A data mismatch problem is indicated when a model performs poorly on both the training set and the dev set.
Data mismatch is identified when a model performs well on _____ data but poorly on the dev set.
Match each observation or action to its correct interpretation in Ng's data mismatch analysis process.
Arrange these reasoning steps in the correct order for understanding why data properties must be compared when mismatch is found.
Methodology for analyzing distribution discrepancies when data mismatch is detected
Diagnosing performance difference between training and dev sets in a speech recognition system
Analyzing distribution differences to address poor dev set performance