Concept

Deep learning train/dev/test split

Data inputted into deep learning algorithms should be split into a distribution in which the majority of data (well over 70%+) is fed into the training set. This differs from traditional machine learning algorithms which had ~70% of the data given to the training set. This difference comes from the fact that there is now much more data available and deep learning algorithms improve with more data, whereas traditional machine learning algorithms improvement levels out after a certain amount of data is fed in.

For example, in a dataset of 1 million examples, you mighht decide that 10k examples are enough to evaluate which algorithm is better - 98% train, 1% dev, 1% test.

0

3

Updated 2021-08-05

Tags

Data Science