Essay

Analyze why expanding the training dataset fails to resolve high training-set error.

Question: According to the course principles, explain why adding more training data is an ineffective technique when a model's training error is significantly higher than the target error, and describe what the developer must focus on instead.

Sample answer: When training error is high, the primary issue is high avoidable bias. Adding more training data is designed to resolve variance problems (where the model fails to generalize to new data) but does not improve the model's capacity to fit the existing training set. Therefore, it has no significant effect on reducing bias. Instead, the developer must first focus on improving the algorithm's performance on the training set before expecting any improvements on the dev or test sets.

Key points:

  • Adding more training data helps with variance problems, not bias.
  • A high training error relative to target error indicates avoidable bias.
  • Adding training data has no significant effect on reducing bias.
  • Practitioners must first improve performance on the training set.

Rubric: Response should clearly state that adding more data helps with variance but not bias/training error. It must specify that the model has high avoidable bias, and that the developer must prioritize improving training set performance before dev/test set performance.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Related