Filtering

Filter NumPy data for specific values.

Chapter Goals:

  • Learn how to filter data in NumPy
  • Write code for filtering NumPy arrays

A. Filtering data

Sometimes we have data that contains values we don't want to use. For example, when tracking the best hitters in baseball, we may want to only use the batting average data above .300. In this case, we should filter the overall data for only the values that we want.

The key to filtering data is through basic relation operations, e.g. ==, >, etc. In NumPy, we can apply basic relation operations element-wise on arrays.

The code below shows relation operations on NumPy arrays. The ~ operation represents a boolean negation, i.e. it flips each truth value in the array.

Press + to interact
arr = np.array([[0, 2, 3],
[1, 3, -6],
[-3, -2, 1]])
print(repr(arr == 3))
print(repr(arr > 0))
print(repr(arr != 1))
# Negated from the previous step
print(repr(~(arr != 1)))

Something to note is that np.nan can't be used with any relation operation. Instead, we use np.isnan to filter for the location of np.nan.

The code below uses np.isnan to determine which locations of the array contain np.nan values.

Press + to interact
arr = np.array([[0, 2, np.nan],
[1, np.nan, -6],
[np.nan, -2, 1]])
print(repr(np.isnan(arr)))

Each boolean array in our examples represents the location of elements we want to filter for. The way we perform the filtering itself is through the np.where function.

B. Filtering in NumPy

The np.where function takes in a required first argument, which is ...