Learn Before
Concept
Pandas .drop_duplicates()
.drop_duplicates() removes duplicate rows in a dataframe.
- The subset parameter is used if you want to remove rows with duplicates only in certain columns instead of the default which is all columns.
- The keep parameter determines which duplicate row you want to keep if any at all. keep='first' will keep the first duplicate and delete the rest, keep='last' will keep the last duplicate and delete the rest, and keep=False will delete all of them (False is the default).
Given the pictured dataframe (df),
#Removes all duplicate rows df.drop_duplicates() #Removes duplicates with the same "brand" df.drop_duplicates(subset=['brand']) #Removes duplicates with same "brand" and "style" #Keeps the last duplicate row df.drop_duplicates(subset=['brand', 'style'], keep='last')

0
1
Updated 2021-05-25
Tags
Python Programming Language
Data Science