Learn Before
Concept

70/30 Train/Test Split Heuristic Does Not Apply to Large Datasets

Using 30% of the data for the test set had been a popular heuristic, and it works well with a modest number of examples, such as 100 to 10,000 examples. For big-data machine learning problems, the fraction of data allocated to dev/test sets has been shrinking even as the absolute number of examples in those sets has been growing. Dev/test sets do not need to be excessively large beyond what is needed to evaluate algorithm performance.

0

1

Updated 2026-05-26

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI

Learn After