Case Study

Diagnose and resolve the dataset distribution gap in a mobile cat detector

Case context: You are building a cat image detector. Your training set consists of high-quality, clear internet images of cats. However, your dev set is collected from mobile app users who frequently move their cellphones while taking pictures, resulting in significant motion blur. Consequently, your detector performs poorly on the dev set.

Question: What data modification action should you take to improve the detector's performance on the dev set, and why?

Sample answer: You should apply simulated motion blur to the non-blurry internet images in the training set. This modifies the training data to align its distribution with the dev set, allowing the model to learn features robust to the motion blur caused by cellphone users' movements.

Key points:

  • Apply simulated motion blur to the clear training images.
  • Match the training set distribution to the blurry dev set distribution.
  • Address the mismatch caused by mobile users moving their cellphones.

Rubric: Learners should identify simulated motion blur as the modification to apply to the training set and explain that it reduces the distribution gap between the clear training images and the blurry dev set images.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related