Smoothing a Noisy Dev Set Learning Curve at Small Training Sizes
Case context: You are training a model and plotting its learning curves. At small training sizes (e.g., 10 examples), the plotted training and dev error values fluctuate wildly, making it impossible to determine if the model is experiencing high bias or high variance. You want to apply the random subset averaging technique using your original set of 100 training examples.
Question: Based on the provided case context, how should you construct the training subsets, and how will you obtain the final values to plot on the learning curve for the small training size of 10 examples?
Sample answer: To smooth the noise at the small training size of 10 examples, I should select 3 to 10 different training subsets of 10 examples each from the original 100 examples by sampling with replacement. I will then train a different model on each of these subsets and compute the training error and dev set error for each model. Finally, I will compute the average training error and average dev set error across these models and plot these average values on the learning curve.
Key points:
- Construct 3-10 training subsets of 10 examples by sampling with replacement from the original 100 examples.
- Train a different model on each of the constructed subsets.
- Compute individual training and dev set errors for each of the trained models.
- Average the training and dev errors across all models and plot the averages.
Rubric: The response must outline: 1) selecting 3-10 subsets of 10 examples using sampling with replacement from the 100 original examples, 2) training a different model on each subset, 3) calculating training and dev set error for each model, and 4) averaging and plotting the resulting training and dev errors.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
When to Use Noise-Reduction Techniques for Learning Curves
When averaging learning curves over multiple random subsets, what is the correct procedure for each randomly selected small training set?
To reduce noise in a learning curve, you should train a single model on multiple randomly chosen training sets of the same small size.
After training a different model on each random subset of the same small size, you compute and plot the _____ training error and dev set error.
Why does averaging learning curves over multiple random subsets help reveal the true learning trend?
The averaging technique described by Andrew Ng uses sampling with replacement to create the multiple small training sets.
Instead of training just one model on a small set, Ng recommends training _____ different models on different randomly chosen subsets of the same size.
Match each term in the averaging technique to its correct description.
Order the steps of the averaging-over-multiple-random-subsets procedure as described in Machine Learning Yearning.
According to Machine Learning Yearning, approximately how many randomly chosen training subsets should you select when using the averaging technique?
In the averaging technique, a single model is trained jointly on all the randomly chosen small training subsets combined into one larger set.
After training each model on a small subset, you compute both the training error and the _____ error for each model before averaging.
Match each challenge to the element of the averaging technique that directly addresses it.
Order the reasoning steps a practitioner should follow when deciding to apply and interpret the averaging technique for noisy learning curves.
How does averaging learning curves over multiple random subsets reduce noise, and what are the specific steps to execute this technique?
Smoothing a Noisy Dev Set Learning Curve at Small Training Sizes
Sampling Method and Metric Plotting for Learning Curve Noise Reduction