Applying human comparison to a superhuman medical imaging system
Case context: Your ML system for detecting anomalies in X-rays surpasses human radiologists in average accuracy across the dev/test set. However, error analysis reveals that for a specific subset—pediatric patients—human radiologists still have a higher accuracy rate than your algorithm.
Question: How should you utilize this pediatric image subset to drive further progress in your system, despite its overall superhuman average performance?
Sample answer: Because humans still outperform the algorithm on the pediatric subset, this data can be used for human comparison. We can obtain higher quality labels from pediatric radiologists, draw on their intuition to understand why they correctly identified anomalies the system missed, and use their specific accuracy rate as a performance target.
Key points:
- Recognize the pediatric subset as a domain where human-comparison applies.
- Obtain better labels from humans for pediatric images.
- Use human intuition to analyze system errors on pediatric images.
- Set human performance on pediatric images as the desired target.
Rubric: Evaluates application of the three human comparison benefits (labels, intuition, targets) specifically to the pediatric image subset.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
Progress Slows After Machines Surpass Human-Level Performance
When does comparing to human performance still help an ML system that already surpasses average human-level accuracy on the dev/test set?
Even after a system surpasses average human-level performance on the full dev/test set, human comparison can still provide value on specific data subsets.
On subsets where humans outperform the algorithm, humans can still provide better _____, useful intuition, and a desired performance target.
Match each benefit of using human comparison on a human-better data subset to its description.
Order the reasoning steps for deciding whether human-comparison techniques still apply after your system surpasses average human-level performance.
In the MLY speech recognition example, at which task does the system surpass humans while humans still outperform the system at a different task?
Once a system's average performance on the full dev set exceeds human-level performance, human-comparison techniques like error analysis and human labeling no longer apply at all.
According to MLY, human-comparison techniques apply 'so long as there are dev set examples where humans are _____ and your algorithm is wrong.'
Match each element of the MLY speech recognition example to its role in the human-better-subset framework.
Order the steps for leveraging a human-better subset—like rapidly spoken speech in the MLY example—to improve an ML system.
Explain how human comparison helps superhuman systems
Applying human comparison to a superhuman medical imaging system
When human-comparison techniques no longer apply