Filtering
Explore advanced pandas filtering techniques focused on numerical data manipulation. Learn to use boolean indexing, the where() method for conditional selection and replacement, and the mask() method for inverse filtering. Understand parameters like inplace, axis, and level to efficiently manage DataFrame elements and MultiIndex alignment.
We'll cover the following...
Recap of boolean indexing
Before we dive into filtering numerical values with the pandas methods of where() and mask(), it’ll be good to revisit the concept of boolean indexing. Boolean indexing is the technique of selecting data from a DataFrame based on an array of True/False values so that only the elements from the original data, where the corresponding element in the mask is True, are selected.
This array of True/False values is known as a boolean mask and has the same shape as the original data. The True or False values in the boolean mask are determined by the specific criteria we define. For example, we have the following subset of the credit card dataset, and we set a condition for numerical values to be less than 40:
The output above displays two results:
A boolean mask with the same shape as the original DataFrame.
A filtered DataFrame where numerical values that meet the criteria (i.e., have a
Truevalue in the corresponding boolean mask) are retained, while the elements that don’t meet the criteria are replaced withNaNvalues.
Filtering with where()
Building on the concept of boolean indexing, the where() method allows us to ...