Case Study

Case study: Designing an error analysis spreadsheet for a dog classifier with incorrect labels.

Case context: A machine learning developer is building a dog and great cat classifier. During error analysis, they suspect that a noticeable portion of the validation set images have incorrect ground-truth labels, which might be skewing the error rates. They want to check if the volume of these errors is large enough to warrant a complete dataset cleaning.

Question: Based on the text, what action should the developer take in their error analysis spreadsheet to investigate this, and what specific columns should their spreadsheet contain?

Sample answer: The developer should add a 'Mislabeled' category to the error analysis spreadsheet to track the fraction of examples that are mislabeled. The spreadsheet should be structured with the following six columns: Image, Dog, Great cat, Blurry, Mislabeled, and Comments.

Key points:

  • Add a 'Mislabeled' category to track the fraction of examples that are mislabeled.
  • Use the specific columns: Image, Dog, Great cat, Blurry, Mislabeled, and Comments.

Rubric: A correct response must identify that a 'Mislabeled' category should be added to keep track of the fraction of mislabeled examples. Additionally, it must list the six columns: Image, Dog, Great cat, Blurry, Mislabeled, and Comments.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related