Filtering
Filter DataFrames for values that fit certain conditions.
We'll cover the following...
Chapter Goals:
- Understand how to filter a DataFrame based on filter conditions
- Write code to filter a dataset of MLB statistics
A. Filter conditions
In the Data Manipulation section, we used relation operations on NumPy arrays to create filter conditions. These filter conditions returned boolean arrays, which represented the locations of the elements that pass the filter.
In pandas, we can also create filter conditions for DataFrames. Specifically, we can use relation operations on a DataFrame's column features, which will return a boolean Series representing the DataFrame rows that pass the filter.
The code below demonstrates how to use relation operations as filter conditions.
df = pd.DataFrame({'playerID': ['bettsmo01', 'canoro01', 'cruzne02', 'ortizda01', 'cruzne02'],'yearID': [2016, 2016, 2016, 2016, 2017],'teamID': ['BOS', 'SEA', 'SEA', 'BOS', 'SEA'],'HR': [31, 39, 43, 38, 39]})print('{}\n'.format(df))cruzne02 = df['playerID'] == 'cruzne02'print('{}\n'.format(cruzne02))hr40 = df['HR'] > 40print('{}\n'.format(hr40))notbos = df['teamID'] != 'BOS'print('{}\n'.format(notbos))
In the code above, we created filter conditions for df
based on the columns labeled 'playerID'
, 'HR'
, and 'teamID'
. The boolean Series outputs have True
for the rows that pass ...