Filtering Data
This lesson focuses on how to filter data with Pandas.
We'll cover the following...
Filtering
Filtering is the process of extracting a subset of your data based on some condition or constraint. These conditions can be on the values that the data items take. We filter data when we wish to look at a smaller part of the whole data. For instance, we may want:
- the data in a particular period of the year
- the data of the highest selling items
- the data for a specific group of items
- to remove extra or useless data
Data filtering is done on almost every dataset before doing any analysis. Let’s look at some examples using our California Housing Dataset.
import pandas as pddf = pd.read_csv('housing.csv')print(df.head())
Let’s say we want to see the data for all the housing blocks that are close to the ocean. From the above code block, we know that there is text in the ocean_proximity
column instead of numbers. We will first find out how many distinct values there are in this column and then decide how to filter rows for our requirement.
import pandas as pddf = pd.read_csv('housing.csv')# Find all distinct values in ocean_proximityunique_values = df['ocean_proximity'].unique()print(unique_values)
We have used the function unique()
on the ocean_proximity
column in line 5 to obtain all the unique values in this ...