Case Study

Evaluating Training Data Addition for a High Variance Speech Recognition System

Case context: You are developing a speech recognition system that is suffering from high variance. You have identified a massive repository of additional unlabeled voice clips that could be labeled and added to your training set. However, your team's budget for compute resources is nearly exhausted.

Question: Based on Andrew Ng's guidelines, decide whether adding this data is currently a viable strategy to address the system's high variance, and explain why or why not.

Sample answer: Adding more training data is not currently a viable strategy because although you have access to significantly more data, you lack the necessary computational power (due to the exhausted budget) to process it. According to Ng, both access to more data and enough computational power are required for this remedy to be viable.

Key points:

  • Identify that the system suffers from high variance.
  • Acknowledge that there is access to significantly more data.
  • State that adding data is not viable due to the lack of computational power/compute budget.
  • Recall that computational power and access to data are both required to make this remedy viable.

Rubric: The student must state that adding training data is not currently viable. They must explain that while the requirement of having significantly more data is met, the requirement of having enough computational power to process the data is not met due to the exhausted compute budget.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy