Filtering
Learn how to use where() and mask() for filtering and replacing data based on boolean indexing.
We'll cover the following...
Recap of boolean indexing
Before we dive into filtering numerical values with the pandas
methods of where()
and mask()
, it’ll be good to revisit the concept of boolean indexing. Boolean indexing is the technique of selecting data from a DataFrame based on an array of True
/False
values so that only the elements from the original data, where the corresponding element in the mask is True
, are selected.
This array of True
/False
values is known as a boolean mask and has the same shape as the original data. The True
or False
values in the boolean mask are determined by the specific criteria we define. For example, we have the following subset of the credit card dataset, and we set a condition for numerical values to be less than 40
:
# Display original DataFrameprint('Original DataFrame')print(df)print('=' * 50)# Get boolean maskbool_mask = [df < 40]print('Boolean Mask')print(bool_mask)print('=' * 50)# Apply condition of numerical values < 40 on entire DataFrameoutput = df[df < 40]print('Filtered DataFrame')print(output)
The output above displays two results:
A boolean mask with the same shape as the original DataFrame.
A filtered DataFrame where numerical values that meet the criteria (i.e., have a
True
value in the corresponding boolean mask) are retained, while the elements that don’t meet the criteria are replaced withNaN
values.
Filtering with where()
Building on the concept of boolean indexing, the where()
method allows us to ...