Diagnosing Classifier Ranking Mismatch in Spam Detection
Case context: A development team is building a spam classifier. Their development set and accuracy metric rank Classifier A higher than Classifier B. However, the team manually reviews the results and realizes Classifier B is actually superior for the product because Classifier A lets too many highly offensive emails pass through. The team suspects their evaluation framework has missed the mark.
Question: Given this scenario, identify the warning sign that occurred, name the most likely cause of the issue among the three outlined by Andrew Ng, and state what change the team must make and what action they must take immediately after.
Sample answer: The warning sign is that the dev set plus metric ranks Classifier A higher, but the team thinks Classifier B is superior for the product. The most likely cause is that the evaluation metric is measuring something other than what the project needs to optimize (accuracy alone doesn't capture the product's need to block offensive emails). The team must quickly change the evaluation metric to align with their true objectives, and then make sure the entire team knows about the new direction.
Key points:
- Identify the discrepancy in classifier ranking as the key warning sign.
- Diagnose that the metric is no longer measuring what is most important to the project.
- Recommend changing the metric quickly.
- Ensure the team is informed of the new direction.
Rubric: The response must identify the classifier ranking discrepancy as the warning sign, specify that the metric optimizes the wrong objective as the cause, recommend changing the metric, and state the necessity of informing the team about the new direction.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Dev/Test Set Distribution Not Representative of Actual Distribution
Dev Set Overfitting from Repeated Evaluation
Metric Optimizes the Wrong Project Objective
Which scenario is a warning sign that your dev/test set or evaluation metric needs to change?
True or False: Discovering that your initial dev/test set or metric missed the mark is a serious setback that cannot be easily corrected.
If your _____ is no longer measuring what is most important to you, Ng recommends changing it rather than continuing to optimize for it.
Which of the following is the key warning sign that your dev/test set or evaluation metric may need to be changed?
Ng considers discovering that a dev/test set or metric missed the mark to be a serious setback that requires restarting the evaluation process from scratch.
If your metric is no longer measuring what is most important to your project, you should change the _____.
Match each cause of a dev set/metric incorrectly ranking classifiers to the fix Ng recommends.
Order the steps a team should take upon discovering their dev/test set or metric is no longer guiding them correctly.
Your dev set contains formal customer emails but users primarily submit short social media posts. Which cause does this best illustrate?
After changing your dev/test sets or evaluation metric, updating the project files is sufficient — there is no need to explicitly inform the team of the new direction.
If you have overfit to the dev set, Ng recommends getting more _____ data.
Match each problem scenario to the cause category it represents in Ng's framework for incorrect classifier ranking.
Order the reasoning steps for deciding whether and how to change an evaluation metric that may no longer reflect project goals.
Analyze the warning signs and causes of a development set incorrectly ranking classifiers
Diagnosing Classifier Ranking Mismatch in Spam Detection
Response to Overfitting the Development Set