Essay

Explain the rationale for including target-distribution examples in the training set.

Question: In a machine learning project, you have a massive amount of auxiliary data from a different distribution than your dev/test set. Why should you still allocate a portion of your limited target-distribution data to the training set, rather than placing all of it in the dev/test sets? Discuss how this allocation impacts the training process and the evaluation of the model's performance on subsets of the data.

Sample answer: Including target-distribution data in the training set allows the neural network to directly learn from the specific distribution it will be evaluated on. If you only use auxiliary data for training, the model might struggle to generalize to the target distribution. Additionally, by having target-distribution data in both the training set and a training dev set, you can evaluate performance on the target distribution during training. If the model performs well on target-distribution training data but poorly on target-distribution training dev data, it validates the hypothesis that acquiring more target-distribution data would be beneficial.

Key points:

  • Helps the neural network learn directly from the target distribution.
  • Enables evaluation of performance specifically on the target distribution during training.
  • Comparing performance on target-distribution training data vs. target-distribution training dev data helps determine if more target-distribution data is needed.

Rubric: The response should explain that target-distribution data helps the model learn the specific characteristics of the target domain. It should also discuss how having this data in both training and training dev sets allows for specific performance comparisons to validate hypotheses about data requirements.

0

1

Updated 2026-06-13

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI