Concept

Data-centric approach: increase consistency of labels

One way to improve data quality is to focus efforts on assigning labels as consistently as possible. Finding the best labels for for a few hundred out of 10,000 training set examples can result in great gains in performance. Ng’s chart on the bottom right shows that changes to decrease noise in a small-to-medium dataset can result in increases in performance that bring it up to par with performance of very large datasets (which are less affected by noise).

Image 0

0

2

Updated 2021-04-09

Tags

Data Science