Case Study

Analyzing Error Metrics for Data Mismatch

Case context: You are developing an ML system and have tracked its performance across different data splits. You observe a 10% error on the training set, an 11% error on the training-dev set, and a 20% error on the dev set.

Question: Based on these metrics, what should you diagnose as the primary causes of the model's errors, and what issue can you confidently rule out?

Sample answer: You should diagnose the system as suffering from high avoidable bias (indicated by the 10% training error) and data mismatch (indicated by the jump from 11% training-dev error to 20% dev error). You can rule out high variance on the training set distribution because the training-dev error is only 1% higher than the training error.

Key points:

  • Diagnose high avoidable bias.
  • Diagnose data mismatch.
  • Rule out high variance on the training-set distribution.

Rubric: A correct diagnosis will explicitly state that the model suffers from high avoidable bias and data mismatch, while correctly ruling out high variance on the training set distribution based on the provided error rates.

0

1

Updated 2026-06-12

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI