Filtering
Filter NumPy data for specific values.
We'll cover the following...
Chapter Goals:
- Learn how to filter data in NumPy
- Write code for filtering NumPy arrays
A. Filtering data
Sometimes we have data that contains values we don't want to use. For example, when tracking the best hitters in baseball, we may want to only use the batting average data above .300. In this case, we should filter the overall data for only the values that we want.
The key to filtering data is through basic relation operations, e.g. ==
, >
, etc. In NumPy, we can apply basic relation operations element-wise on arrays.
The code below shows relation operations on NumPy arrays. The ~
operation represents a boolean negation, i.e. it flips each truth value in the array.
arr = np.array([[0, 2, 3],[1, 3, -6],[-3, -2, 1]])print(repr(arr == 3))print(repr(arr > 0))print(repr(arr != 1))# Negated from the previous stepprint(repr(~(arr != 1)))
Something to note is that np.nan
can't be used with any relation operation. Instead, we use np.isnan
to filter for the location of np.nan
.
The code below uses np.isnan
to determine which locations of the array contain np.nan
values.
arr = np.array([[0, 2, np.nan],[1, np.nan, -6],[np.nan, -2, 1]])print(repr(np.isnan(arr)))
Each boolean array in our examples represents the location of elements we want to filter for. The way we perform the filtering itself is through the np.where
function.
B. Filtering in NumPy
The np.where
function takes in a required first argument, which is ...