Search⌘ K
AI Features

Basic Filtering

Explore fundamental filtering methods in Pandas, including single and multiple condition filters and negation. Understand how to apply these techniques effectively to select and exclude data based on complex criteria for improved data analysis.

Filtering

One of the most useful features in Pandas is the ability to filter elements from a Series or DataFrame using very simple expressions. You may want to select travelers originating from specific countries in a travel dataset, or patients with specific health conditions in a medical dataset. In all such cases, you’re bound to use the filtering feature of Pandas in your data analysis.

Basics

The first, and easiest, method to filter elements in a DataFrame is using the square brackets [] notation. This allows you to filter rows based on one or multiple conditions.

Example 1

You want to filter the music DataFrame to select only artists from the UK.

Python
import pandas as pd
df = pd.read_csv('music.csv')
uk_artists = df[df['country']=='UK']
print(uk_artists)

Given a DataFrame, you use the square brackets [], write a conditional expression inside them. In this case, you want all the artists whose origin is in the UK (df['country']=='UK'). The result of this expression is a Boolean Series that will filter the original DataFrame and keep only the rows where this expression is True.

Another option for the expression would be to access it this way: df.country == 'UK'.

Multiple filters

Often you may want to apply multiple filters on a DataFrame. For a travel dataset, you might want to filter by young adults who originate from Europe. For a medical dataset, you might want to filter by senior citizens who have diabetes. Pandas allows you to write multiple filters in the same conditional expression.

Example 2

We want to filter the “music” DataFrame to select only the rock artists who have 200 plays or more.

Python
import pandas as pd
df = pd.read_csv('music.csv')
out = df[(df['genre']=='rock') & (df['plays']>=200)]
print(out)

In this case, you’ll notice you are still using conditional expressions. When there are multiple conditions, however, you can place them all in brackets (), and use a Boolean operator such as & or | to combine them. In this case, you want to satisfy both conditions; so, you will use &.

Negation

So far, you’ve seen how to apply filters where you choose the rows that satisfy a condition. What if you now want to choose the rows that don’t match the condition you’ve supplied?

In this case, you can use the ~ operator before a conditional expression. For instance, you are trying to filter artists outside the UK. However, writing the following won’t work:

df[~df['country']=='UK']]

To negate a bracketed expression, the following would work:

df[~(df['country']=='UK'])]

The real use-case for this is applying the negation to a conditional expression with multiple filters.

Example 3

You want to exclude the most successful bands from the US, so you will keep only the bands from either:

  1. Outside the US
  2. Inside the US but have plays < 500

In other words, you want to exclude bands that are from the US, and have greater than or equal to 500 plays.

Python
import pandas as pd
df = pd.read_csv('music.csv')
out = df[~((df.country=='US') & (df.plays >= 500))]
print(out)

Now that you’ve refreshed your knowledge in basic filtering, it is time to move on to more advanced challenges!