Relation

Common Practices of Train/Test Set Arrangements in NLP

  • Data is divided 80% as training set, 10% as development set, and 10% as test set.
  • Test set should be as large as possible, since a small test set may be accidentally unrepresentative.
  • A good common practice is to find the smallest test set that gives us enough statistical power to measure a significant difference between models, since we also need the training set to include as much as data.

0

1

Updated 2021-09-30

Tags

Natural language processing

Data Science