Pandas DataFrame Operations - Dealing With Missing and Duplicates

Understand how to manage missing and duplicate data in Pandas DataFrames through detection methods like isnull, data cleaning techniques such as dropna and drop_duplicates, and value imputation with fillna. Learn how to create new columns from existing data to enhance analysis. This lesson equips you to maintain data integrity for effective data science workflows.

We'll cover the following...

- - 8. Dealing With Missing Values
- - - a. Detecting Null Values
- - - b. Dropping Null Values
- - - c. Imputation (Filling Null Values)
- - 9. Handling Duplicates
- - 10. Creating New Columns From Existing Columns
- - Jupyter Notebook

8. Dealing With Missing Values

The difference between fake data and real-world data is that real data is rarely clean and homogeneous. One particular issue that we need to tackle when working with real data is that of missing values. And it’s not just about values being missing, different data sources can indicate missing values in different ways as well.

The two flavors in which we are likely to encounter missing or null values are:

None: A Python object that is often used for missing data in Python. None can only be used in arrays with data type ‘object’ (i.e., arrays of Python objects).
NaN (Not a Number): A special floating-point value that is used to represent missing data. A floating-point type means that, unlike with None’s object array, we can perform mathematical operations. However, remember that, regardless of the operation, the result of arithmetic with NaN will be another NaN.

Run the examples in the code widget below to understand the difference between the two. Observe that performing arithmetic operations on the array with the None type throws a run-time error while the code executes without errors for NaN:

1.Python Fundamentals for Data Science

2.The Fundamentals of Statistics

3.Machine Learning 101

4.End-to-End Machine Learning Project

5.The Real Talk

Mock Interview

Pandas DataFrame Operations - Dealing With Missing and Duplicates

8. Dealing With Missing Values