Data Cleaning
Learn how to perform data cleaning in pandas.
We'll cover the following
Dropping duplicates
Many datasets have duplicate entries.
The drop_duplicates
method will remove values that appear more than once. We can determine whether to keep the first or last duplicate value found using the keep parameter. If we set it to 'last'
, it will use the last value. The default value is 'first'
. If we set it to False
, it will remove any duplicated values (including the initial value). Notice that this call keeps the original index. Let’s see if there are any duplicates in our dataset.
Get hands-on with 1400+ tech skills courses.