Case Study

Diagnosing Co-existing Errors in a Speech Recognition System

Case context: A team is building a speech recognition model using synthetic training data. Upon evaluation, they find that the training error is significantly higher than the human-level error, the training-dev error is much higher than the training error, and the validation (dev) error on real-user recordings is even higher than the training-dev error.

Question: Based on the performance metrics of the speech recognition system, diagnose which subset of the three primary error sources (avoidable bias, variance, and data mismatch) this algorithm suffers from, and explain your reasoning.

Sample answer: The algorithm suffers from a subset containing all three problems: high avoidable bias, high variance, and data mismatch. First, high avoidable bias is present because the training error is significantly higher than the human-level error. Second, high variance is present because the training-dev error is much higher than the training error, indicating a generalization issue to data from the same distribution. Third, data mismatch is present because the dev error is higher than the training-dev error, indicating the model struggles with the distribution shift between synthetic training data and real-user validation data.

Key points:

  • Identify high avoidable bias from the gap between human-level error and training error.
  • Identify high variance from the gap between training error and training-dev error.
  • Identify data mismatch from the gap between training-dev error and dev error.
  • Conclude that the algorithm suffers from the subset containing all three errors simultaneously.

Rubric: The response must correctly identify all three issues (avoidable bias, variance, and data mismatch) as being present in the algorithm's performance profile, citing the specific gaps in the case context that reveal each error.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related