Missing Data Detection and Calculations
Understand how to detect missing data and perform calculations involving them.
Detecting missing data
Before we can manage the missing (or null) values in our data, we need first to be able to detect them accurately. In pandas
, we have several methods and functions at our disposal to detect missing values.
In the previous lesson, we learned that NaN
isn’t considered equal to any value. It means that if we try to find missing data in Series
or DataFrame objects by comparing the values with np.nan
(e.g., using operators like ==
or >=
), it will not work.
As such, we should instead use the functions in pandas
to detect missing values across the different array data types, namely isnull()
and notnull()
.
Note: Both
isnull()
andnotnull()
are described as functions inpandas
, though they can also be used as methods withpandas
objects such as aSeries
or DataFrame (e.g.,df.isnull()
).
Suppose we have a mock dataset of patient information, as shown below:
Patient Information Dataset with Missing Data
patient_id | Age | Gender | weight_kg | height_cm | cholesterol_mgdl |
123 | 30 | M | 70 | 170 | 200 |
456 | 45 | M | NaN | 165 | 220 |
789 | NaN | F | 60 | NaN | 185 |
321 | 50 | NaN | 80 | 180 | NaN |
654 | 37 | M | 75 | 175 | NaN |
987 | 77 | M | 55 | 160 | 195 |
We can use isnull()
to check whether the DataFrame contains missing values, as shown below:
# Using isnull() as function to check for missing/null values in dfoutput = pd.isnull(df)# View outputprint(output)
We can see from the output that isnull()
returns a boolean mask of the same shape as the DataFrame, where True
indicates a missing value and False
indicates a non-missing value. This helps us pinpoint the locations of the cells with missing data and serves as the base for subsequent processing.
On the other hand, notnull()
returns the opposite mask, where True
indicates a non-missing value, as shown below:
# Using notnull() as function to check for non-missing values in dfoutput = pd.notnull(df)# View outputprint(output)
If we want to select all rows that contain at least one null value, we can ...
Get hands-on with 1400+ tech skills courses.