Sampling Method and Metric Plotting for Learning Curve Noise Reduction
Question: When averaging learning curves over multiple random subsets to reduce noise, what specific sampling method must be used to select the small training subsets, and which two averaged metrics are plotted?
Sample answer: The sampling method that must be used is sampling with replacement. The two averaged metrics that are computed and plotted are the average training error and the average dev set error.
Key points:
- Use sampling with replacement to select the training subsets.
- Plot the average training error and average dev set error.
Rubric: The answer must identify 'sampling with replacement' as the sampling method and specify both 'average training error' and 'average dev set error' (or average dev error) as the two plotted metrics.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
When to Use Noise-Reduction Techniques for Learning Curves
When averaging learning curves over multiple random subsets, what is the correct procedure for each randomly selected small training set?
To reduce noise in a learning curve, you should train a single model on multiple randomly chosen training sets of the same small size.
After training a different model on each random subset of the same small size, you compute and plot the _____ training error and dev set error.
Why does averaging learning curves over multiple random subsets help reveal the true learning trend?
The averaging technique described by Andrew Ng uses sampling with replacement to create the multiple small training sets.
Instead of training just one model on a small set, Ng recommends training _____ different models on different randomly chosen subsets of the same size.
Match each term in the averaging technique to its correct description.
Order the steps of the averaging-over-multiple-random-subsets procedure as described in Machine Learning Yearning.
According to Machine Learning Yearning, approximately how many randomly chosen training subsets should you select when using the averaging technique?
In the averaging technique, a single model is trained jointly on all the randomly chosen small training subsets combined into one larger set.
After training each model on a small subset, you compute both the training error and the _____ error for each model before averaging.
Match each challenge to the element of the averaging technique that directly addresses it.
Order the reasoning steps a practitioner should follow when deciding to apply and interpret the averaging technique for noisy learning curves.
How does averaging learning curves over multiple random subsets reduce noise, and what are the specific steps to execute this technique?
Smoothing a Noisy Dev Set Learning Curve at Small Training Sizes
Sampling Method and Metric Plotting for Learning Curve Noise Reduction