Diagnose classifier performance using the four-dataset evaluation framework.
Case context: A practitioner is training a classifier where the training data and the dev/test data are drawn from different distributions. The practitioner sets up four separate datasets: a training set, a training dev set, a dev set, and a test set.
Question: Based on the four-dataset evaluation framework, explain what diagnosing performance on the training set, training dev set, and the dev and/or test sets respectively reveals about the classifier.
Sample answer: Evaluating on the training set reveals the training error. Evaluating on the training dev set reveals the algorithm's ability to generalize to new data drawn from the training-set distribution. Evaluating on the dev and/or test sets reveals the algorithm's performance on the target task.
Key points:
- The training set is used to evaluate training error.
- The training dev set is used to evaluate generalization to new data from the training distribution.
- The dev and/or test sets are used to evaluate performance on the target task.
Rubric: The response must correctly identify that: 1. Training set evaluation measures training error. 2. Training dev set evaluation measures generalization to new data from the training distribution. 3. Dev and/or test set evaluation measures performance on the target task.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Which dataset is used to evaluate an algorithm's ability to generalize to new data drawn from the training set distribution?
Training error is evaluated by running the algorithm on the training set.
The algorithm's performance on the task you care about is evaluated using the _____ and/or test sets.
Match each dataset in the four-dataset framework to its primary evaluation purpose.
Order the four dataset evaluations from measuring training error through to final target-task performance as described in ML Yearning.
According to ML Yearning, which dataset combination evaluates the algorithm's performance on the task you care about?
The training dev set is drawn from the same distribution as the dev and test sets.
In the four-dataset framework, the _____ dev set evaluates generalization to new data drawn from the training distribution.
Match each evaluation goal to the dataset that measures it in ML Yearning's four-dataset framework.
Order the diagnostic reasoning steps a practitioner follows when using the four-dataset framework to identify an algorithm's key weakness.
Describe how training error, generalization, and target-task performance are evaluated in a four-dataset framework.
Diagnose classifier performance using the four-dataset evaluation framework.
Explain what is measured by evaluating on the training dev set.