Concept

A Visual Look at Under and Overfitting using U.S. States

Below is a representation of under and overfitting of the boundaries of U.S. sates. The data comes from the US Census Bureau. In its original format, the data is a single Keyhole Markup Language (KML) file which contains latitude and longitude coordinates of the borders of US states. The necessary latitude, longitude, and label (state) data were parsed from the KML files using a simple Python script. The main idea here is to understand the bias-variance trade-off and how that relates to under and overfitting.

Additionally there are two examples of ways they avoided under and overfitting and created a much more accurate map using (gradient) boosting and random forest classifiers.

Image 0

0

1

Updated 2021-02-10

Tags

Data Science