Concept

Pandas .drop_duplicates()

.drop_duplicates() removes duplicate rows in a dataframe.

  • The subset parameter is used if you want to remove rows with duplicates only in certain columns instead of the default which is all columns.
  • The keep parameter determines which duplicate row you want to keep if any at all. keep='first' will keep the first duplicate and delete the rest, keep='last' will keep the last duplicate and delete the rest, and keep=False will delete all of them (False is the default).

Given the pictured dataframe (df),

#Removes all duplicate rows df.drop_duplicates() #Removes duplicates with the same "brand" df.drop_duplicates(subset=['brand']) #Removes duplicates with same "brand" and "style" #Keeps the last duplicate row df.drop_duplicates(subset=['brand', 'style'], keep='last')
Image 0

0

1

Updated 2021-05-25

Tags

Python Programming Language

Data Science

Related