State the rule for determining class distribution in balanced learning curve subsets.
Question: When constructing small training subsets to plot a learning curve on skewed or many-class data, what rule should you follow to determine the class distribution of each subset?
Sample answer: You should ensure that the fraction of examples from each class in the subset is as close as possible to the overall fraction of that class in the original training set.
Key points:
- Do not select small subsets purely at random for skewed or many-class training data.
- Keep subset class fractions as close as possible to the overall fractions in the original training set.
Rubric: The response must specify that the fraction or proportion of examples for each class in the subset should be as close as possible to the overall fraction in the original training set.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
Why use a balanced subset instead of a random subset when drawing small training sets for learning curves on skewed data?
True or False: On skewed or many-class data, balanced subsets—where each class fraction mirrors the original dataset—produce less noisy learning curves than purely random subsets.
On skewed or multi-class training data, you should choose a _____ subset so that each class fraction matches the original training set as closely as possible.
In which situation does Andrew Ng recommend using a balanced subset when constructing training sets for learning curves?
A balanced subset for learning curves ensures each class appears in proportion to its share of the full training set.
To reduce noise in learning curves on skewed or many-class data, Andrew Ng recommends sampling a _____ subset instead of a purely random one.
Match each term related to balanced subset sampling with its correct description.
Order the steps for constructing a single balanced training subset to plot one point on a learning curve.
What is the primary benefit of using balanced subsets when plotting learning curves on skewed or many-class data?
Random sampling of small training subsets always produces smooth learning curves regardless of class distribution.
If 20% of the original training set is positive examples and you draw a balanced subset of 10, you should include _____ positive examples.
Match each data condition to the sampling problem it causes when constructing small learning curve subsets.
Order the reasoning steps for deciding whether and how to apply balanced subset sampling for a learning curve.
Explain how class imbalance affects learning curves and how balanced subsets resolve this.
Diagnose why a learning curve is noisy for a rare disease classifier and propose a fix.
State the rule for determining class distribution in balanced learning curve subsets.