Case Study

Evaluate data distribution strategies for a new health tech startup.

Case context: Your startup is building a system to diagnose skin conditions from smartphone photos. You have a large dataset of clinical images (high quality) and a smaller dataset of user-submitted photos (varying quality). The engineering team suggests investing time in domain adaptation to train on clinical images and generalize to user photos.

Question: Based on Machine Learning Yearning, should you adopt the engineering team's suggestion to focus on domain adaptation to achieve your immediate application goals? Explain your decision.

Sample answer: No, you should not focus on domain adaptation. Because your goal is to make progress on a specific machine learning application, you should instead choose dev and test sets drawn from the same distribution (the user-submitted photos). Domain adaptation methods are usually applicable only in special problem types and focusing on same-distribution sets will make the team more efficient.

Key points:

  • The goal is practical application progress, not general research.
  • Domain adaptation is less widely used and often restricted to special problem types.
  • The recommended approach is to draw dev and test sets from the same target distribution.
  • Focusing on same-distribution sets improves team efficiency.

Rubric: The response should advise against the domain adaptation approach for this specific application, citing the need for team efficiency and recommending same-distribution dev/test sets.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI