Case Study

Optimizing Features for a Medical Image Classifier

Case context: You are building a diagnostic model with a very limited training dataset of 500 patient records. You initially extract 10,000 features per record. The model suffers from high variance. Your colleague suggests either dropping just 500 features or performing a massive 10x reduction down to 1,000 features to solve the problem.

Question: As a practitioner, how should you evaluate your colleague's suggestions for feature reduction, and what approach is most justified given the dataset size?

Sample answer: Given the small training set, applying feature selection is highly justified and very useful for reducing variance. However, reducing the feature count slightly (dropping only 500 out of 10,000) is unlikely to have a huge effect on bias or significantly resolve the variance. A massive 10x reduction to 1,000 features will have a much more significant effect on variance, but you must be careful not to exclude too many useful features, as this could increase bias. Therefore, the significant 10x reduction is the better approach to tackle the high variance, provided useful features are carefully retained.

Key points:

  • Feature selection is highly appropriate when the training dataset is small.
  • A slight reduction in features is unlikely to impact bias or sufficiently fix variance.
  • A significant reduction is more likely to have a substantial effect.
  • Care must be taken during significant reductions to avoid discarding useful features and increasing bias.

Rubric: Award full credit if the response correctly identifies that feature selection is appropriate for a small dataset, notes that a small reduction won't have much effect, and explains that a significant reduction is needed but carries the risk of increasing bias if useful features are lost.

0

1

Updated 2026-06-07

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI