Learn Before
Evaluating Training Data Addition for a High Variance Speech Recognition System
Case context: You are developing a speech recognition system that is suffering from high variance. You have identified a massive repository of additional unlabeled voice clips that could be labeled and added to your training set. However, your team's budget for compute resources is nearly exhausted.
Question: Based on Andrew Ng's guidelines, decide whether adding this data is currently a viable strategy to address the system's high variance, and explain why or why not.
Sample answer: Adding more training data is not currently a viable strategy because although you have access to significantly more data, you lack the necessary computational power (due to the exhausted budget) to process it. According to Ng, both access to more data and enough computational power are required for this remedy to be viable.
Key points:
- Identify that the system suffers from high variance.
- Acknowledge that there is access to significantly more data.
- State that adding data is not viable due to the lack of computational power/compute budget.
- Recall that computational power and access to data are both required to make this remedy viable.
Rubric: The student must state that adding training data is not currently viable. They must explain that while the requirement of having significantly more data is met, the requirement of having enough computational power to process the data is not met due to the exhausted compute budget.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
What does adding more training data primarily accomplish for a model with high variance?
True or False: Adding more training data is the simplest and most reliable way to address high variance.
Adding more training data can usually reduce _____ without affecting bias.
According to Andrew Ng, what is the simplest and most reliable way to address high variance in a machine learning model?
True or False: Adding training data typically reduces variance while also increasing bias.
Adding training data is the simplest and most _____ way to address high variance.
Match each concept to its role in Ng's guidance on adding training data to reduce variance.
Order the steps a practitioner follows when deciding to add training data as a fix for high variance.
Which pair of conditions does Andrew Ng identify as both necessary for adding training data to be a viable variance remedy?
True or False: Ng recommends adding training data for high variance only after simpler fixes like regularization have been exhausted.
Adding training data typically reduces _____ without affecting bias.
Match each descriptor to the claim Ng makes about adding training data as a variance remedy.
Order the steps for verifying that added training data successfully reduced variance without harming bias.
Analyzing the Impact of Adding Training Data on Bias and Variance
Evaluating Training Data Addition for a High Variance Speech Recognition System
Effect of Adding Training Data on Model Bias