Concept
Data-centric approach: increase consistency of labels
One way to improve data quality is to focus efforts on assigning labels as consistently as possible. Finding the best labels for for a few hundred out of 10,000 training set examples can result in great gains in performance. Ng’s chart on the bottom right shows that changes to decrease noise in a small-to-medium dataset can result in increases in performance that bring it up to par with performance of very large datasets (which are less affected by noise).

0
2
Updated 2021-04-09
Tags
Data Science