Balanced Subsets for Noisy Learning Curves in Skewed or Many-Class Data
When training data is skewed toward one class or has many classes, learning curves can be less noisy if each small subset is balanced so that class fractions are as close as possible to the overall fractions in the original training set.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Learn After
Why use a balanced subset instead of a random subset when drawing small training sets for learning curves on skewed data?
True or False: On skewed or many-class data, balanced subsets—where each class fraction mirrors the original dataset—produce less noisy learning curves than purely random subsets.
On skewed or multi-class training data, you should choose a _____ subset so that each class fraction matches the original training set as closely as possible.
In which situation does Andrew Ng recommend using a balanced subset when constructing training sets for learning curves?
A balanced subset for learning curves ensures each class appears in proportion to its share of the full training set.
To reduce noise in learning curves on skewed or many-class data, Andrew Ng recommends sampling a _____ subset instead of a purely random one.
Match each term related to balanced subset sampling with its correct description.
Order the steps for constructing a single balanced training subset to plot one point on a learning curve.
What is the primary benefit of using balanced subsets when plotting learning curves on skewed or many-class data?
Random sampling of small training subsets always produces smooth learning curves regardless of class distribution.
If 20% of the original training set is positive examples and you draw a balanced subset of 10, you should include _____ positive examples.
Match each data condition to the sampling problem it causes when constructing small learning curve subsets.
Order the reasoning steps for deciding whether and how to apply balanced subset sampling for a learning curve.
Explain how class imbalance affects learning curves and how balanced subsets resolve this.
Diagnose why a learning curve is noisy for a rare disease classifier and propose a fix.
State the rule for determining class distribution in balanced learning curve subsets.