Learn Before
Missing Data Handling
Missing data is important to handle early on in the data processing stage because it can easily affect regressions and other types of analysis at later stages of the pipeline. Pandas provides a variety of tools for dealing with missing values.
Missing values will typically be stored as a np.nan and will appear in the printed data frames as NaN. To drop all rows with NaN, you can use the df.dropna(how="any"). This will remove all rows that have at least 1 NaN.
You can also fill all NA values with the df.fillna(value = 5) and it will set all NaN entries to 5 in the data frame. You can also perform these operations on subsets of the data frame for more specific manipulation.
0
1
Contributors are:
Who are from:
Tags
Python Programming Language
Data Science
D2L
Dive into Deep Learning @ D2L