Learn Before
Concept
Determining Whether to Gather More Data
- First, determine whether the performance on the training set is acceptable. If performance on the training set is poor, the learning algorithm is not using the training data that is already available, so there is no reason to gather more data. Instead, try increasing the size of the model by adding more layers or adding more hidden units to each layer.
- Try improving the learning algorithm (tuning the learning rate hyperparameter).
- If large models and carefully tuned optimization algorithms do not work well, then the problem might be the quality of the training data. This suggests starting over, collecting cleaner data, or collecting a richer set of features.
- If the performance on the training set is acceptable, then measure the performance on a test set. If test set performance is much worse than training set performance, then gathering more data is one of the most effective solutions.
- A simple alternative to gathering more data is to reduce the size of the model or improve regularization, by adjusting hyperparameters such as weight decay coefficients, or by adding regularization strategies such as dropout.
0
0
Updated 2021-07-12
Tags
Data Science