Relation
Common Practices of Train/Test Set Arrangements in NLP
- Data is divided 80% as training set, 10% as development set, and 10% as test set.
- Test set should be as large as possible, since a small test set may be accidentally unrepresentative.
- A good common practice is to find the smallest test set that gives us enough statistical power to measure a significant difference between models, since we also need the training set to include as much as data.
0
1
Updated 2021-09-30
Tags
Natural language processing
Data Science