Missing Data Detection and Calculations

Detecting missing data

Before we can manage the missing (or null) values in our data, we need first to be able to detect them accurately. In pandas, we have several methods and functions at our disposal to detect missing values.

In the previous lesson, we learned that NaN isn’t considered equal to any value. It means that if we try to find missing data in Series or DataFrame objects by comparing the values with np.nan (e.g., using operators like == or >=), it will not work.

As such, we should instead use the functions in pandas to detect missing values across the different array data types, namely isnull() and notnull().

Note: Both isnull() and notnull() are described as functions in pandas, though they can also be used as methods with pandas objects such as a Series or DataFrame (e.g., df.isnull()).

Suppose we have a mock dataset of patient information, as shown below:

Get hands-on with 1300+ tech skills courses.