.drop_duplicates() removes duplicate rows in a dataframe.
- The subset parameter is used if you want to remove rows with duplicates only in certain columns instead of the default which is all columns.
- The keep parameter determines which duplicate row you want to keep if any at all. keep='first' will keep the first duplicate and delete the rest, keep='last' will keep the last duplicate and delete the rest, and keep=False will delete all of them (False is the default).

Given the pictured dataframe (df),
```python
#Removes all duplicate rows 
df.drop_duplicates()
#Removes duplicates with the same "brand"
df.drop_duplicates(subset=['brand'])
#Removes duplicates with same "brand" and "style" 
#Keeps the last duplicate row
df.drop_duplicates(subset=['brand', 'style'], keep='last')
```

University of Michigan - Ann Arbor

- .drop (Removes values by row or column label)
- .dropna (Removes missing data)
- .drop_duplicates (Removes duplicate values)

Removing Values from DataFrame

https://pandas.pydata.org/docs/index.html

Pandas documentation

.drop() is used to drop rows/index or columns of a dataframe by their label. 
- To specify whether you would like to drop rows/index or columns, you can use the axis parameter within the drop function where axis = 0 is to drop from the rows/index and axis = 1 is to drop columns. 
- An alternative to this is using the index or columns parameters to explicitly say where you would like to drop from. 
- To change the dataframe permanently, you can set the inplace parameter to true or else it will return a copy of the dataframe.

Given the pictured dataframe (df), if you would like to drop columns “B” and “C” you can run…
```python
#Using axis parameter
df.drop(['B', 'C'], axis=1)
#Using columns parameter
df.drop(columns=['B', 'C'])
```

Given the pictured dataframe (df), if you would like to drop the rows with indices 0 and 1 you can run…
```python
#Using axis parameter
df.drop([0, 1], axis=0)
#Using index parameter
df.drop(index=[0,1])
```

Pandas .drop()

This function removes any missing values from a dataframe. 
- Using the axis parameter (0 for indices and 1 for columns) you can determine where you would like to drop values from. 
- The how parameter (any or all) determines whether you want to drop rows or columns with any NA values (any) or all NA values (all)
- The thresh parameter is optional but determines if you want to keep rows or columns with at least that many non-NA values.

```python
#Drops rows with at least one NA value
df.dropna()
#Drops columns with at least one NA value
df.dropna(axis = 1)
#Drops rows with all values missing
df.dropna(how = 'all)
#Keeps rows only with at least 3 non NA values
df.dropna(thresh = 3)
```

Learn Before

Related