Case Study

Dev and Test Set Design for Mobile Image Applications

Case context: A team is building a cat detector app. They have collected 100,000 website images for training. However, the app will run on mobile devices, and the team expects the future data from mobile phone images to differ in nature from the website images. The team decides to take 30% of the website images to serve as their dev and test sets.

Question: According to Machine Learning Yearning, diagnose the problem with the team's split strategy and explain how they should choose their dev and test sets instead.

Sample answer: The problem is that the dev and test sets are taken from the website images distribution, which differs from the mobile phone images distribution expected in the future. The team should not assume the training distribution is the same as the test distribution, nor should they simply split 30% of the available training data. Instead, they should choose dev and test sets that reflect the mobile phone images they expect to receive in the future and want the system to perform well on.

Key points:

  • A 30% random split of website images does not represent the mobile phone image distribution.
  • Dev and test sets should be chosen to reflect future data (mobile phone images) rather than training data (website images).
  • The team must design the evaluation sets around the distribution they want to perform well on.

Rubric: The answer must identify that a 30% split of website images is problematic because mobile phone images differ in distribution, and state that dev/test sets must reflect the future mobile phone image distribution the system is targeted to perform well on.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related